UrlRewritingNet performance issues

October 26, 2010

My organization has recently spent hundreds of total man-hours troubleshooting production CPU/performance issues that we have identified as being caused by the third party UrlRewritingNet.UrlRewrite module. This post will be a summary and debrief of our troubleshooting and resolution. If you are using this component and are experiencing performance issues, perhaps this will be beneficial to you.

Background and disclaimer

First, the disclaimer: I can not conclusively prove that there is anything wrong with UrlRewritingNet. I’ve only briefly reviewed the source code and confirmed it’s not blatantly terrible. For all intents and purposes it seems like a fine, albeit older (and what appears to be no longer maintained) free URL rewriting solution for ASP.NET applications.

UrlRewritingNet is implemented as an HTTPModule. Rewrite rules are defined as a regular expression in a configuration file (or in the web.config). On every request, they are evaluated and the module “rewrites” the URL as necessary. It can also be interfaced with programmatically.

<add name="rule1" virtualUrl="^~/test/(.*).aspx" destinationUrl="~/test.aspx?q=$1" />

As for the background, my company’s website was using this to rewrite URLs for some sections of the site. We had about 100 rules, which I don’t consider to be exceptional. For a long time, our production web servers were having intermittent performance issues (specifically high CPU utilization related to the W3WP.exe process). Inevitably this issue would subside just long enough for us to ignore it, until it came back with even higher or prolonged instances of excessive CPU utilization. Eventually, it became too much to ignore, and we finally identified and resolved the problem.

Debugging Tools for Windows

This was the classic “it works on my machine” problem that is so hard to diagnose. There were no exceptions, no problems on any non-production machine, no database server issues. All we knew was that the production web servers (Server 2003/IIS6 by the way) were intermittently collapsing under the pressure of…something.

What saved us was a memory dump – a few of them, actually. We installed the Debug Diagnostic Tool on one of the production servers and performed a full IIS hang dump. We then loaded up the dump file in WinDbg, wired up SOS and got to work.

0:000> !threads

This command shows information about the managed threads. There happens to be 22 threads, all background. Nothing particularly unusual here. Moving on.

0:000> !eestack

This command dumps all thread stacks. Below is one of the threads. Note the bold. (this has been edited a bit for brevity)

… 
mscorlib_ni!System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[[System.__Canon, mscorlib]].Insert...
mscorlib_ni!System.RuntimeType+RuntimeTypeCache+MemberInfoCache`1[[System.__Canon, mscorlib]].AddMethod...
mscorlib_ni!System.RuntimeType.GetMethodBase(System.RuntimeTypeHandle, System.RuntimeMethodHandle)+0x1a0
mscorlib_ni!System.Reflection.Emit.DynamicScope.GetTokenFor(System.RuntimeMethodHandle)+0x30
mscorlib_ni!System.Reflection.Emit.DynamicILGenerator.Emit(System.Reflection.Emit.OpCode, System.Reflection.MethodInfo)+0x149
System_ni!System.Text.RegularExpressions.RegexCompiler.Callvirt(System.Reflection.MethodInfo)+0xa5
System_ni!System.Text.RegularExpressions.RegexCompiler.GenerateOneCode()+0x11b0
System_ni!System.Text.RegularExpressions.RegexCompiler.GenerateForwardSection()+0x1529
System_ni!System.Text.RegularExpressions.RegexCompiler.GenerateGo()+0x13b
System_ni!System.Text.RegularExpressions.RegexLWCGCompiler.FactoryInstanceFromCode(System.Text.RegularExpressions.RegexCode, ...
System_ni!System.Text.RegularExpressions.RegexCompiler.Compile(System.Text.RegularExpressions.RegexCode, System.Text.RegularExpressions.RegexOptions...
System_ni!System.Text.RegularExpressions.Regex.Compile(System.Text.RegularExpressions.RegexCode, System.Text.RegularExpressions.RegexOptions)+0x12
System_ni!System.Text.RegularExpressions.Regex..ctor(System.String, System.Text.RegularExpressions.RegexOptions, Boolean)+0x202
UrlRewritingNet_UrlRewriter!UrlRewritingNet.Web.RegExRewriteRule.CreateRegEx()+0x62
UrlRewritingNet_UrlRewriter!UrlRewritingNet.Web.RegExRewriteRule.Initialize(UrlRewritingNet.Configuration.RewriteSettings)+0x74
UrlRewritingNet_UrlRewriter!UrlRewritingNet.Web.UrlRewriteModule.Init(System.Web.HttpApplication)+0x1a1
System_Web_ni!System.Web.HttpApplication.InitModulesCommon()+0x7c
System_Web_ni!System.Web.HttpApplication.InitInternal(System.Web.HttpContext, System.Web.HttpApplicationState, System.Reflection.MethodInfo[])+0x18f
System_Web_ni!System.Web.HttpApplicationFactory.GetNormalApplicationInstance(System.Web.HttpContext)+0x138
System_Web_ni!System.Web.HttpApplicationFactory.GetApplicationInstance(System.Web.HttpContext)+0x85
System_Web_ni!System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest)+0xc4
System_Web_ni!System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest)+0x7c
System_Web_ni!System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32)+0x18c

That’s interesting. Out of 22 threads, most of them were idle thread pool threads, and 5 of them look something like the above. UrlRewritingNet is creating a regular expression that is getting compiled. That’s an expensive operation, and one that – at most – should only happen on application startup, once for every URL rewrite rule, no more.

It’s possible the dump was taken just as the app pool was recycling, but we took several dumps at different points in time and always saw multiple threads compiling regular expressions for UrlRewritingNet.

0:000> !dumpheap –stat

This command shows memory consumption by type. The first column is the count of objects, the second is the memory footprint. This is a snippet of the highest memory-consuming objects in the heap where the bottom is the highest (System.Byte[]).  Note the bold.

  
   8161     12948160 System.Char[]
    1758     14033336      Free
   20976     14290704 System.Collections.Hashtable+bucket[]
  858215     20597160 System.Reflection.Emit.DynamicScope
  920311     22087464 System.RuntimeTypeHandle
  285987     22878960 UrlRewritingNet.Web.RegExRewriteRule
  858428     27469696 System.Reflection.Emit.LineNumberInfo
  286178     27473088 System.Text.RegularExpressions.Regex
  928586     37143440 System.Collections.ArrayList
  858428     41204544 System.Reflection.Emit.ScopeTree
  859105     41237040 System.Reflection.Emit.SignatureHelper
  858215     48060040 System.Reflection.Emit.DynamicMethod+RTDynamicMethod
  858215     96120080 System.Reflection.Emit.DynamicMethod
  858215    137314400 System.Reflection.Emit.DynamicILGenerator
1737698    158033080 System.String
6007338    192234816 System.Reflection.Emit.GenericFieldInfo
1842110    476240424 System.Object[]
  572154    516188016 System.Reflection.Emit.__FixupData[]
2351800    545836376 System.Int32[]
24239414    581745936 System.RuntimeMethodHandle
2643389    849136768 System.Byte[]

There are 285,987 UrlRewritingNet RegExRewriteRules which account for almost all of the regular expressions (286,178) in the heap!! Again, we only had about 100 URL rewrite rules defined in our configuration file, and somehow there are over a quarter of a million rewrite rule objects in the heap. What the heck?!

ISAPI Rewrite to the rescue

The evidence in the dump files was pretty damning. I pulled open the UrlRewritingNet source code and made sure it wasn’t actually creating a new regular expression for every rule for every request. It wasn’t. What else could be causing it to malfunction? I don’t honestly know. The more rewrite rules there are may be a factor, but that doesn’t particularly make sense.  All I could find was this discussion where a few other people seemed to have similar issues.

We chose to implement Helicon’s ISAPI Rewrite version 3 as the fix. It’s widely used, cheap (free if you can deal with some of the lite version limitations) and fast. After a bit of R&D, we translated the rules, tested, deployed, and held our breath.

The results?

This is the CPU utilization for one week running with UrlRewritingNet. It should be noted that some of the “valleys” where CPU is low may be partially attributed to the server being pulled out of rotation for a period of time due to unresponsiveness.

CPU load with UrlRewritingNet

And this is the first week after implementing ISAPI Rewrite.

CPU load with ISAPI Rewrite

Pretty dramatic, eh?

If you happen to be using UrlRewritingNet and are experiencing any performance problems, I highly recommend you to investigate this further and consider other solutions.

Regular expression performance comparisons

October 24, 2010

Pop quiz, hotshot. You need to match a string using a regular expression and you need it to execute as fast as possible. Application startup time doesn’t matter and each method will execute 10,000 times. Which option do you chose?

//A
void Method_A()
{
    Regex x = new Regex("somepattern");
    x.IsMatch("test.string!");
}

//B
void Method_B()
{
    Regex.IsMatch("test.string!", "somepattern");
}

//C
static void Method_C()
{
    Regex x = new Regex("somepattern");
    x.IsMatch("test.string!");
}

//D
static readonly Regex regexForMethod_D = new Regex("somepattern");
public void Method_D()
{
    regexForMethod_D.IsMatch("test.string!");
}

//E
static readonly Regex regexForMethod_E = new Regex("somepattern",
	RegexOptions.Compiled);
public void Method_E()
{
    regexForMethod_E.IsMatch("test.string!");
}  

Bonus question: two pairs of these options take just about the same time to execute. Which are they, and which pair is faster? The answers are below.

Note: for brevity, I’ve listed “somepattern” as the regex pattern. For my actual testing, I used a pattern to match an email address – the one in the code sample just below.

Programming regular expressions by coincidence

I’ll admit, regular expressions intimidate me just a little. They’re powerful and fast, but the complex, archaic-looking patterns just make me want to avoid them. When I did have to use a Regex, I would just google for a piece of code, confirm that it generally worked and move on.

//need to validate an email... [copy/paste from a website:]
Regex x = new Regex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*");
x.IsMatch("e@mail.com");
//it works... moving on

This is the definition of programming by coincidence. Regular expressions are somewhat complex. Initialization of a Regex object is not a trivial operation, and then there’s that whole question of whether you should compile or not.

You can get away with programming by coincidence most of the time. When your servers are melting in a giant ball of fire because your website is initializing and compiling several regular expressions on every request… you can’t.

Regular expression options for an ASP.NET application

This article deals with regular expressions in a general matter, but I want to focus on using them on the context of an ASP.NET website: A application that is typically long-running, and has several global (static) classes/members as well as instance objects (such as a web form Page) in which a regular expression may be used.

Static vs Instance members

The first thing to consider is whether your regular expression object should be defined as an instance member, or as a static member. Unless the Regex pattern is variable or only ever used once, I can’t think of a reason why you would ever want to create a Regex as an instance variable in an ASP.NET application.

As a general rule of thumb, create regular expressions as static read-only objects if they are going to be executed several times. D is much faster than A.

//A
void Method_A()
{
    Regex x = new Regex("somepattern");
    x.IsMatch("test.string!");
}

//D
static readonly Regex regexForMethod_D = new Regex("somepattern");
public void Method_D()
{
    regexForMethod_D.IsMatch("test.string!");
}

I’m currently maintaining code that has a lot of the following in it: A static method that initializes a local regular expression within it. When I was programming by coincidence, I’d see this and at first be taken aback “a non-static Regex – this is probably slow because it gets initialized on every call!” and then I’d see it was in a static method and think “…well, the compiler must surely optimize this. It’s a local member of a static method so that’s fine.” Wrong.

The following - option C in our quiz – is no faster that option A.

//C
static void Method_C()
{
    Regex x = new Regex("somepattern");
    x.IsMatch("test.string!");
}

What about the static method Regex.IsMatch()?

There’s a static IsMatch() method on the Regex class that can be called as a quick an easy way to evaluate an expression. My understanding is that this is defined just like option C, but by my measurements can perform up to twice as fast. In fact, Regex.IsMatch() – option B – performs at about the same speed as option D under my testing scenario.

//B
void Method_B()
{
    Regex.IsMatch("test.string!", "somepattern");
}

I’ll have to investigate this further.

Compile vs not compile

The final question is whether you should initialize your regular expression object using RegexOptions.Compiled. This generally depends. A compiled Regex gives about 30% better runtime performance but can take as much as ten times longer to initialize.

With an ASP.NET app, I’d say the answer is pretty clear in most cases: go with the compiled version (on a static Regex object!). The BCL team states “…the bottom line is that you should only use this mode for a finite set of expressions which you know will be used repeatedly” which is the case in most of the regular expressions I come across in an ASP.NET application. You’re mileage may vary, of course.

Option E, when ignoring the increased startup cost, is the fastest of all the options presented.

//E
static readonly Regex regexForMethod_E = new Regex("somepattern", RegexOptions.Compiled);
public void Method_E()
{
    regexForMethod_E.IsMatch("test.string!");
}  

The verdict

Ignoring initialization time, here are the results of the quiz – where each method was executed 10,000 times.

regex execution time comparison

Calling IsMatch() on the compiled static Regex was by far the fastest operation, and in my opinion, the best way to use a regular expression in an ASP.NET application provided it will be used often.

What about the initialization costs? It is definitely not trivial to use RegexOptions.Compiled!

regex startup times

But given this only has to happen once (and not 10,000 times!) for the life of an ASP.NET application, I’m willing to accept this startup cost for the comparatively smaller runtime performance increase of the non-compiled version.

About Kurt

I'm a senior consultant at Headspring in Austin, TX. My passion is creating web-based applications that are well crafted and solve real problems for real people. Want to know more? Check out my about page.

. @LipGlosserie setting up for Renegade Austin craft fair http://t.co/7X4WBVQb 15 hours ago