Steve Lillis' Code Blog

Friday, 13 February 2015

Liquid for C#: Defining Success

Introduction

In the High-Level Overview for this series I mentioned that I'll need a way to measure the success of this project as it progresses.

As this project's aim is to create a one for one replica of the Ruby implementation of Liquid's behaviour, I will be porting Liquid's integration tests to C# and following a test driven approach.

What's in a Name?

Though they have been called integration tests in Liquid's code, the majority of these tests are in fact functional acceptance tests, which is what makes them useful for confirming that the behaviour of the system is correct.

Unit Test

Tests the behaviour of a single system component in a controlled environment.

Integration Test

Tests the behaviour of major components of a system working together.

Functional Acceptance Test

Tests that the system, per the technical specification, produces the expected output for each given input.

Unit and integration tests verify that the code you've written is doing what it was written to do, while functional acceptance tests verify that the system as a whole, without consideration for the structure of its internal components, does what it is designed to do.

Any Port in a Storm

There are hundreds of tests to port to C# and, as it turns out, not all of the tests in the Ruby implementation's integration namespace are integration or functional acceptance tests... some are unit tests!

The porting process is therefore a matter of replicating the original tests as faithfully as possible, translating them into functional acceptance tests where needed.

A test that ported smoothly

# Ruby
def test_for_with_range
    assert_template_result(
        ' 1  2  3 ',
        '{%for item in (1..3) %} {{item}} {%endfor%}')
end

// C#
public void TestForWithRange()
{
    AssertTemplateResult(
        " 1  2  3 ", 
        "{%for item in (1..3) %} {{item}} {%endfor%}");
}

A test that needed translation

# Ruby - The below are unit tests
#        for methods escape and h
def test_escape
    assert_equal '<strong>', @filters.escape('<strong>')
    assert_equal '<strong>', @filters.h('<strong>')
end

// C# - Rewritten as a test of the 
//      output expected from a template
public void TestEscape()
{
    AssertTemplateResult(
        "&lt;strong&gt;", 
        "{{ '<strong>' | escape }}");
}

When translating from a unit or integration test to a functional acceptance test, I'm using the documentation and wiki as the design specification. This ensures that the tested behaviour is the templating language's expected behaviour, not just the behaviour I expect!

What's Next?

Once all of the tests are ported, the next step will be to start writing the code to pass those tests. Remember, in Test Driven Development we start with failing tests and then write the code to make those tests pass.

The AssertTemplateResult method mentioned earlier currently looks like this:

protected void AssertTemplateResult(
                   string expected, 
                   string source)
{
    // TODO: implement me!
    throw new NotImplementedException();
}

There's still a few hundred more tests to port yet, though, so wish me luck!

Monday, 9 February 2015

Liquid for C#: High-Level Overview

Introduction

In the Liquid For C# series, I will be writing a C# interpretor for the Liquid templating language from scratch.

In this first post I define the project's scope and overall intention. Code does not factor into this stage at all, it's purely about the API's purpose, not it's implementation.

Broad Strokes

The first step in any project is to define what it will be doing at the highest level. Ideally, this should be expressible as a single sentence or a simple diagram.

This project's definition is deceptively simple: Template + Data = Output.

Armed with this very general definition, the next step is to break the overall process into broad, functionally cohesive chunks. I find that this is best achieved by running through potential use cases. The below is the outcome of that process.

It immediately jumps out at me that the Abstract Syntax Tree and steps that follow are implementation agnostic. This means that they are not specific to Liquid and, because of this, can be re-used in any templating language interpretor.

Defining Success

The question then becomes one of how to know when the project fulfils its purpose.

As the aim of this project is to provide a full C# implementation of Liquid's behaviour as it is currently implemented in Ruby, I will port all of the integration tests for Liquid to C# and follow a Test Driven Development approach. I will only consider the project to be a success when it passes all of the original tests.

What Next?

In bigger teams or projects its necessary to delve much deeper in the design phase, going as far as to define the interfaces for the API and how they plug together so that all involved parties can work independently without going off in completely different directions.

Since this is just me working on a hobby project, though, I'll instead be taking a very iterative approach and in the next post I'll be writing code!

Wednesday, 4 February 2015

Restructuring DotLiquid: Part 3

The Issue at Hand

For those who didn't know, DotLiquid is a straight C# port of Liquid, a library written in Ruby.

The Ruby programming language is significantly different to C#, so even best-effort attempts at like-for-like reconstruction of the library inevitably lead to structural issues in the API's design.

Lost in Translation

The structural issues that come from direct porting include:

Excessive use of static classes.
Excessive use of Reflection.
Lack of Object Oriented design, leading to inflexibility.
Duplicate code. Tight knit classes force code to be repeated.
Excessive boxing and unboxing, leading to degraded performance.

That's not to do down DotLiquid though, which is an exceptional direct port of the original library, as for the majority of cases it is more than fast enough and anyone who has written code using the Ruby implementation of Liquid will be able to pick up DotLiquid and use it in the exact same way without hesitation.

In my quest to produce the perfect API, however, my implementation has become so far removed from DotLiquid's interface, implementation and intent that I have decided to start afresh.

Be sure to come back for my next post, where I'll begin the high level design process for the API including how and why I'll be drawing distinct boundaries between its elements.

Friday, 23 January 2015

Restructuring DotLiquid: Part 2

Bringing down the Hammer

I mentioned in Part 1 that DotLiquid's Condition hierarchy could do with being a bit more object oriented.

As conditions are a relatively small and isolated part of the API, it's a great place to start this series in earnest, so that's where I'll begin.

The Restructure

Here's a before and after of the Condition class hierarchy.

BEFORE

AFTER

First, I introduced a new interface, ICondition, and I did this for two reasons:

Not all future developers will want to use the class ConditionBase as a base - they might have new code requirements or their own base class.
No class that has a dependency on conditions should be forced to depend upon a specific implementation - by using the interface I make those classes compatible with any implementation.

Next, I refactored And and Or logic out of Expression and into their own classes. I did this because the code for And, Or and Expression may be logically cohesive, but it is not functionally cohesive. Incidentally, their code's lack of functional cohesion is what made them so easy to separate.

I made ConditionBase an abstract class to better indicate its purpose as a foundation, as opposed to a class that can be used effectively on its own.

I moved the static collection Operators out of ExpressionCondition and into its own class. This needs further work, as it shouldn't be static at all, but it's a start. More on this in a later post.

The IsElse property is a classic code smell because it will only be true on one occasion: when the Type is ElseCondition. Any logic that utilises the property would be better off inside the ElseCondition itself, thereby encapsulating the functionality, so I changed the signature of the Evaluate method to take a ConditionalStatementState object and moved the check for whether an ElseCondition should render inside ElseCondition.

// BEFORE
// =====================
// The owning block's render method:
var executeElseBlock = true;
foreach (var block in Blocks)
{
    if (block.IsElse)
    {
        if (executeElseBlock)
        {
           return RenderAll(block.Attachment, context, result);
        }
    }
    else if (block.Evaluate(context))
    {
        RenderAll(block.Attachment, context, result);
        executeElseBlock = false;
    }
}

// The ElseCondition's evaluate method:
public override bool Evaluate(Context context)
{
    return true;
}

// AFTER
// =====================
// The owning block's render method:
var state = new ConditionalStatementState(context);
foreach (var block in Blocks)
{
    if (block.Evaluate(state))
    {
        ++state.BlockRenderCount;
        var retCode = block.Render(context, result);
        if (retCode != ReturnCode.Return)
            return retCode;
    }
}

// The ElseCondition's evaluate method:
public override bool Evaluate(ConditionalStatementState state)
{
    return state.BlockRenderCount <= 0;
}

It's worth noting that I could have introduced an additional base class for AndCondition and OrCondition for which they override the evaluate method and share the Left and Right properties, but they do so little internally that it felt like overkill. Should they ever grow in size, an abstract base class can be retrofitted painlessly enough.

Summary

Overall, this is a great first step on the path to a clean and pure API, but there's still a lot more work to be done. I suspect that by the end of this series DotLiquid's API will be a significantly different beast, exposing the same functionality in a much more flexible API.

I'm really enjoying the challenge and, if you'd like me to clarify anything, feel free to let me know in the comments!

Wednesday, 21 January 2015

Restructuring DotLiquid: Part 1

Introduction

In the previous series, Optimising DotLiquid, the focus was to improve rendering performance. In this series, the focus is to improve DotLiquid's API.

With DotLiquid v2.0 on the far horizon, now is the perfect time to smooth any rough edges that have appeared over the course of development and distil the API into its purest, cleanest form.

What Makes a Great API?

Accessible

For an API to be accessible requires consistency in the naming convention it uses, it's method signatures and chosen design patterns. The API should also be minimalist, exposing no more public methods, objects or functionality beyond those that drive the end user's interaction with the API.

Flexible

A great API makes as few assumptions about how it will be used as possible. Keeping class coupling to a minimum, allowing the end user to pick and choose functionality with good object oriented design and keeping class dependencies to a minimum are all part of making an API flexible.

Extensible

A great API has to be easy to extend. This means making key methods virtual, classes concise and substitutable and avoiding any behind the scenes hack-magic. The principles of SOLID really come into their own when it comes to extensibility, because you never know which direction the next developer will want to go.

A Bird's Eye View

When fine tuning an API, implementation takes a back seat to architecture. After all, we're designing the interface by which developers interact with the library to achieve a goal, not how that goal is achieved.

The quickest way to get an architectural overview is to add a class diagram to the project. Here's the class diagram for DotLiquid as it stands at the end of Optimizing DotLiquid.

This diagram tells me a lot about the state of the DotLiquid API as it currently stands.

The classes with a dotted outline are helper classes, extension classes and containers for commonly used resources. This is fine in a small project, but in an API this could be preventing a third party from tweaking core functionality. I'll be looking to see what can be refactored into instance classes that are provided as configuration to templates, improving flexibility and customisability.

The class Condition isn't respecting the Single Responsibility Principle. It currently has the responsibilities of evaluating an expression, evaluating an AND condition and evaluating an OR, too. ElseCondition and the property IsElse aren't the OOP ideal, either, so refactoring of the condition hierarchy will yield benefits for Extensibility.

The down arrow marked against quite a few of the methods in this diagram indicates the use of the internal access modifier. In the places that it's been used, it would appear that these methods are being used as back door access to functionality that isn't exposed publicly. This is a code smell that harms extensibility and may indicate deeper structural issues, so I'll be looking to do away with them completely.

The Tag class and associated hierarchy has a wide, shallow inheritance structure that is self-explanatory. This is an example of great Object Oriented Design. Other than a few public and internal methods I'd like to clean up, I doubt there's much work to be done to the already clean, accessible signature seen here.

What's Next?

In the next post of this series I'll single out an area of DotLiquid's architecture that could use improvement, explain why such improvements are needed and then implement the changes with before and after class diagrams...

It's going to be awesome!

Friday, 16 January 2015

Optimising DotLiquid: Part 6

Exceptions Should Be Exceptional

A recent merge into the main DotLiquid repository added support for the keywords continue and break using exceptions and try/catch further up the render chain for flow control.

There are two significant reasons why exceptions should never be used for controlling the flow of a program:

Exceptions have a major performance impact.
It is not obvious without serious digging where those exceptions will be handled or if they will be handled at all.

Having this exception-based implementation is better than not having an implementation, of course, but in this series' war on performance it is an obvious target.

Optimisation

The concept for this optimisation is simple enough: to replace the flow controlling exceptions and try/catch handling with proper program flow.

You can see the full changeset on GitHub, but below is a brief summary of the changes.

The break tag class

// BEFORE
// ==================================
public class Break : Tag
{
    public override void Render(
                            Context context, 
                            TextWriter result)
    {
        throw new BreakInterrupt();
    }
}

// AFTER
// ==================================
public class Break : Tag
{
    public override ReturnCode Render(
                                Context context, 
                                TextWriter result)
    {
        return ReturnCode.Break;
    }
}

Shared - the RenderAll loop body

// BEFORE
// ==================================
if (token is IRenderable)
    ((IRenderable) token).Render(context, result);
else
    result.Write(token.ToString());

// AFTER
// ==================================
var renderable = token as IRenderable;
if (renderable != null)
{
    var retCode = renderable.Render(context, result);
    if (retCode != ReturnCode.Return)
        return retCode;
}
else
    result.Write(token.ToString());

The for tag class - break and continue handling

// BEFORE
// ==================================
try
{
    RenderAll(NodeList, context, result);
}
catch (BreakInterrupt)
{
    break;
}
catch (ContinueInterrupt)
{
}

// AFTER
// ==================================
if (RenderAll(NodeList, context, result) == ReturnCode.Break)
    break;

A quick re-run of all of the Unit Tests tells me that this far-reaching changeset has maintained all the original expected behaviours.

Initial Timings

The below timings were all taken during the same time period on the same machine. They are based on 10,000 iterations per test.

Render Time (ms)	Part 5	With New Flow Control
Minimum	6.631110	5.78710
Maximum	8.65750	7.61880
Range	2.02640	1.83170
Average	6.87194	5.99984
Std. Deviation	0.20780	0.18964

Summary

Simply avoiding the anti-pattern of using exceptions to control program flow has reduced render time by more than 10%.

A minor side-effect of updating the program flow in this way is that anyone who has written their own tags will need to make the following changes:

The return Type of the Render method is now ReturnCode.
Wherever return is used, return ReturnCode.Return instead.
Whenever the result of RenderAll is not ReturnCode.Return, return that result immediately. (Example in Block.RenderAll)

This side effect only affects developers who have created their own tags, anyone downloading and using the library as-is will enjoy increased performance without having to make any changes.

Thursday, 15 January 2015

Optimising DotLiquid: Part 5

A New Frontier

The Optimising DotLiquid series is so far off to a very promising start. With the core classes significantly improved, its now time to optimise the DotLiquid featureset as a whole, and that means I need a new test template.

The New Template

The new template is designed to incorporate all flow control, iteration and variable tags. A number of different parameter variations for each tag have also been included.

It's ugly, but it does the job:

{% for x in (1..5) %}
<h1>Tests all except filters</h1>
Also doesn't use INCLUDE or EXTENDS, to be tested later
<div>
<h2>Variable Tags</h3>
<h3>Assign</h3>
{% assign handle = 'cake' -%}
{{ handle }}
<h3>Capture</h3>
{% capture my_variable %}I am being captured.{% endcapture -%}
{{ my_variable }}
</div>
<div>
<h2>Control Flow Tags</h2>
<h3>Case (non-else)</h3>
{% case handle -%}
  {% when 'cake' -%}
     This is a cake
  {% when 'cookie' -%}
     This is a cookie
  {% else -%}
     This is not a cake nor a cookie
{% endcase -%}
<h3>Case (else)</h3>
{% case handle -%}
  {% when 'a' -%}
     This is a cake
  {% when 'b' -%}
     This is a cookie
  {% else -%}
     The else statement was reached
{% endcase -%}
<h3>If equals (non-else)</h3>
{% if user.name == 'Steve Jackson' -%}
  Equals failed on match
{% elsif user.name == 'Steve Lillis' -%}
  Equals was a success
{% else -%}
  Equals failed to else
{% endif -%}
<h3>If not equals (non-else)</h3>
{% if user.name != 'Steve Jackson' -%}
  Not equals was a success
{% else -%}
  Not equals failed
{% endif -%}
<h3>If (else)</h3>
{% if user.name == 'Steve Jackson' -%}
  Unexpected user
{% else -%}
  Else body reached
{% endif -%}
<h3>Unless</h3>
{% unless user.name == 'Steve Jackson' -%}
  Unless worked
{% else -%}
  Unless failed
{% endunless -%}
</div>
<div>
<h2>Iteration Tags</h2>
<h3>For (with cycle)</h3>
{% for item in user.items -%}
 {% cycle 'one', 'two', 'three' %}: {{ item.description }} 
{% endfor -%}
<h3>For (reversed)</h3>
{% for item in user.items reversed -%}
 {% cycle 'one', 'two', 'three' -%}: 
        {% if item.description == 'First Item' -%} 
  {{ item.description | upcase -}} 
 {% else -%} 
  {{ item.description -}} 
 {% endif -%}
{% endfor -%}
<h3>For (Limit: 2)</h3>
{% for item in user.items limit:2 -%}
 {% cycle 'one', 'two', 'three' %}: {{ item.description }} 
{% endfor -%}
<h3>For (Offset: 2)</h3>
{% for item in user.items offset:2 -%}
 {% cycle 'one', 'two', 'three' %}: {{ item.description }} 
{% endfor -%}
<h3>For Range</h3>
{% for i in (1..4) -%}{{ i -}},
{% endfor -%}
<h3>For Range (Continue on 2)</h3>
{% for i in (1..4) -%} {% if i == 2 %} {% continue %} 
{% endif %} {{ i -}},
{% endfor -%}
<h3>For Range (Break on 2)</h3>
{% for i in (1..4) -%} {% if i == 2 %} {% break %} 
{% endif %} {{ i -}},
{% endfor -%}
<h3>Table Row (Cols:2, Limit:4)</h3>
<table>
{% tablerow item in user.items cols:2 limit:4 %}
  {{ item.Description }}
  {{ item.Cost }}
{% endtablerow %}
</table>
</div>
{% endfor %}

Initial Timings

The below timings were all taken during the same time period on the same machine. They are based on 10,000 iterations per test.

Render Time (ms)	Original Code	With Optimisations
Minimum	15.89480	6.52180
Maximum	20.14610	8.96140
Range	4.25130	2.43960
Average	16.35774	6.76287
Std. Deviation	0.42730	0.20170

Summary

The optimisations made thus far have had a distinct impact but ~6.8 milliseconds to render is still much higher than I'd like, especially if a reduced render time is only a few minor improvements away.

In the next post ill be overhauling how the break and continue keywords are implemented in DotLiquid, as I've noticed that they're currently using exceptions to control program flow.