Tuesday, 13 January 2015

Optimising DotLiquid: Part 3

Respecting Regex

Regular expressions are a powerful tool. They're so powerful, in fact, that it's easy to get carried away.

As you might expect from a solution that relies on parsing raw text into program flow, DotLiquid uses Regex extensively. In this part of the Optimising DotLiquid series, I'll be improving how DotLiquid works with Regex.

The Optimisation

The original code for DotLiquid uses the static method Regex.Match extensively to determine whether or not given inputs match various regular expressions.

Regex.Match checks for a cached instance of the Regex class for the given expression and creates one if it doesn't exist. Regex.Match then returns the result of invoking the instance's Match method with the original input.

The default cache size for Regex.Match is 15, so more than 15 expressions being used in calls to Regex.Match across the application will result in unnecessary work recreating instances that were created previously but then dumped.

It's also worth noting that when trying to save fractions of a millisecond, as I am in this series, repeatedly looking up a cached value is wasteful.

Using static references to pre-compiled Regex instances instead of relying on Regex.Match to handle caching and persistence completely sidesteps both of these issues.

An example can be seen below and you can view the full changeset on GitHub.

// BEFORE
// ==================================
private object Resolve(string key)
{
    [...]
    var match = Regex.Match(R.Q(@"^'(.*)'$"), key);
    if (match.Success)
        return match.Groups[1].Value;
    [...]

// AFTER
// ==================================
private static Regex _singleQuotesRegex 
    = new Regex(R.Q(@"^'(.*)'$"), RegexOptions.Compiled);

private object Resolve(string key)
{
    [...]
    var match = _singleQuotesRegex.Match(key);
    if (match.Success)
        return match.Groups[1].Value;
    [...]

Check out this excellent article on compiling Regex for more information on when and when not to use compiled Regex in C#.

Timings

The below timings were all taken during the same time period on the same machine. They are based on 10,000 iterations per test.
Render Time (ms)Original CodePart 2 CodeRegexOptions.Compiled
Minimum 3.46160 3.35360 1.47660
Maximum 5.74940 5.29750 2.90560
Range 2.28780 1.94390 1.42900
Average 3.64991 3.54096 1.53506
Std. Deviation 0.16530 0.17852 0.07369

Analysis

With these changes DotLiquid rendering is now more than twice as fast!

The average time to render has been reduced to just 42% of the original average render time, with very little impact on the readability and memory usage of the code. It's a definite win.

What's Next?

There might be places where Regex isn't needed at all and simple string operations would be a better fit so I'll be reviewing the code for such opportunities.

For the next part of this series I'll also be looking at how DotLiquid rendering handles loops, cycles, assignments and case statements and potentially cutting similar performance corners as I did in Optimising DotLiquid: Part 2.

Stay tuned!

No comments:

Post a Comment