Steve Lillis' Code Blog: C# best practice

Showing posts with label C# best practice. Show all posts

Friday, 7 August 2015

What Is "Good" Object Oriented Programming?

Introduction

The term Object-Oriented Programming (OOP) is so ubiquitous in modern software development that it has become a buzzword, appearing on every software engineer's résumé by default with little consideration for what it means to be good at it.

So what does it mean to be good at it?

Defining "Good" OOP

Providing a general definition of OOP is relatively easy:

"Instead of a procedural list of actions, object-oriented programming is modeled around objects that interact with each other. Classes generate objects and define their structure, like a blueprint. The objects interact with each other to carry out the intent of the computer program" (Wikipedia)

By contrast, a concrete definition of what makes for good OOP is tough to capture so concisely. Good OOP is defined by a small collection of detailed design principles that includes SOLID, Cohesion and Coupling and ultimately leads to the maximum flexibility, understanding and reusability of code possible.

In this post I'll be running through a real-world-like scenario and explaining how these principles make for good OOP along the way.

A Naïve Approach to OOP

I'll use the example of an ExchangeRate object that we'll want to validate, store in and retrieve from a database. It's common for a developer who's new to object-oriented programming to define such a class somewhat like the following:

public class ExchangeRate
{
    private const string ConnectionString = "...";

    public int ID { get; set; }
    public string FromCurrency { get; set; }
    public string ToCurrency { get; set; }
    public double Ratio { get; set; }

    public void Save()        
    {
        using (var conn = new SqlConnection(ConnectionString))
        {
            // TODO: write properties to table
        }
    }

    public static ExchangeRate Load(int id)
    {
        using (var conn = new SqlConnection(ConnectionString))
        {
            // TODO: Read values from table
            return new ExchangeRate(value1, value2, ...);
        }
    }

    public static bool Validate(ExchangeRate ex)
    {
        // Validate currency codes
        // Validate that ratio > 0
    }
}

The beginner designs the class this way because all of the methods and properties on the object feel like they belong together. Grouping code together based on their logical relation to the same thing like this is known as logical cohesion and while it works perfectly well at this scale, logical cohesion quickly has its downfalls.

Here's a rundown of the key problems associated with the class as it is currently designed:

The class will become unmaintainably bloated as we add more and more functionality that relates to ExchangeRate.
This bloat is compounded by the fact that if we later decide to allow load/save from other locations than a SQL database or to validate exchange rates differently depending on varying factors, we'll have to add more and more code to the class to do so.
If the consumer of ExchangeRate doesn't use the SQL related methods, SQL related references are still carted around.
We'll have to duplicate generic load/save code for every object we create beyond ExchangeRate. If there's a bug in that code, we'll have to fix it in every location too (which is why code duplication is bad news).
We're forced into opening a separate connection for every save or load operation, when it might be beneficial to keep a single connection open for the duration of a collection of operations.
ExchangeRate's methods can't be tested without a database because they're tightly coupled to the database implementation.
Anything that wants to load an ExchangeRate will be tightly coupled to the static ExchangeRate.Load(...) method, meaning we'll have to manually change all those references to other load methods if we want to load from a different location at a later date. It also means that those referencers can't be tested without a database either!

Improving OOP Using SOLID

The principles of SOLID give a framework for building robust, future-proof code. The 'S' (Single Responsibility Principle) and the 'D' (Dependency Inversion Principle) are great places to start and yield the biggest benefits at this stage of development.

The dependency inversion principle can be a hard one to grasp at first but is simply that wherever our class tightly couples itself to another class using a direct reference to its Type (i.e. the new keyword or calls to static methods), we should instead find some other way of giving our class an instance of that Type referenced by it's most abstract interface that we need, thereby decoupling our class from specific implementations. This will become clearer as the example progresses.

The single responsibility principle is exactly what you'd expect it to be, that each object should have just one responsibility.

Here's all the responsibilities that the ExchangeRate class currently has:

Hold information representing an exchange rate
Save exchange rate information to a database
Load exchange rate information from a database
Create ExchangeRate instances from loaded information
Validate exchange rate information

Since these are separate responsibilities, there should be a separate class for each.

Here's a quick pass of refactoring ExchangeRate according to these two principles:

public class ExchangeRate
{
    public int ID { get; set; }
    public string FromCurrency { get; set; }
    public string ToCurrency { get; set; }
    public double Ratio { get; set; }
}

public class ExchangeRateSaver
{
    public void Save(Connection conn, ExchangeRate ex)        
    {
        // TODO: write properties to table
    }
}

public interface IExchangeRateFactory
{
    ExchangeRate Create(string from, ...);
}

public class ExchangeRateFactory : IExchangeRateFactory
{
    public ExchangeRate Create(string from, ...)
    {
        return new ExchangeRate(from, to, rate);
    }
}

public class ExchangeRateLoader
{
    private readonly IExchangeRateFactory _factory;

    public ExchangeRateLoader(IExchangeRateFactory factory)
    {
        _factory = factory;
    }

    public ExchangeRate Load(Connection connection, int id)
    {
        // TODO: Read values from table
        return _factory.Create(value1, value2, value3);
    }
}

public class ExchangeRateValidator
{
    public bool Validate(ExchangeRate ex)
    {
        // Validate currency codes
        // Validate that ratio > 0
    }
}

Code grouped in this manner is described as being functionally cohesive. Functional cohesion is considered by many to lead to the most reusable, flexible and maintainable code.

By breaking the code down into separate classes, each with a single responsibility, we have grouped the code by its functional relationships instead of its logical ones. Consuming code and tests can now swap in and out individual chunks of isolated functionality as needed, instead of carting around one monolithic, catch-all class.

Additionally, by inverting the dependencies of ExchangeRateLoader and ExchangeRateSaver, we have improved the testability of the code as well as allowing for any type of connection to be used, not just a SQL one. The benefits of dependency inversion are compounded as more and more classes become involved in a project.

What about the "OLI" in "SOLID"?

The 'O' (Open/Closed Principle) and 'L' (Liskov Substitution Principle) aren't applicable to this example as they relate to revisiting existing production code and to inheritance, respectively.

The 'I' (Interface Segregation Principle) states that no client should be forced to depend on methods it does not use and, for the most part in this example, has been covered by adhering to the Single Responsibility Principle.

If you'd like to see an example of situations when the outcome of applying the ISP and SRP differ, or an example of applying the Open/Close and Liskov substitution principles, let me know in the comments.

In Closing

Hopefully this article has begun to shed some light on how "good" object-oriented code is achieved and how it leads to more flexible, testable and future-proof code.

If you'd like for me to expand on any specific points, or cover how this becomes ever more important as the scale of a project grows, let me know in the comments and I'll do a follow up post!

Sunday, 19 April 2015

async/await: An Unexpected Journey

Introduction

I recently ran into an issue when using async/await with SqlBulkCopy's WriteToServerAsync method and a bespoke implementation of IDataReader, the root cause of which was so surprising that I just had to post about it!

The Problem

The basic process was implemented as follows:

Set the current thread's culture to Spanish (es-ES)
Await (configured to preserve the culture) a SqlBulkCopy.WriteToServerAsync call with a custom reader
Custom reader uses the current thread's culture to decide which resources to use

Expected behaviour:

Due to how async/await and ConfigureAwait preserve the Synchronization Context, it was expected that the custom reader would find the current thread's culture to be Spanish and locate the appropriate Spanish resource files accordingly.

Actual behaviour:

The reader found the current thread's culture to be Spanish, but only sometimes. At varying intervals during a single awaited call to SqlBulkCopy's WriteToServerAsync, the thread on which the reader's Read was executed forgot the culture, reverting to english!

The Investigation

It didn't take too long to identify that the issue was occurring within SqlBulkCopy's WriteToServerAsync method and wasn't anything I had invoked myself, so I popped open the reference source for SqlBulkCopy and through some digging I found this:

private Task WriteRowSourceToServerAsync(
                int columnCount, 
                CancellationToken ctoken) 
{
    Task reconnectTask = _connection._currentReconnectionTask;
    if (reconnectTask != null && !reconnectTask.IsCompleted) 
    {
        if (this._isAsyncBulkCopy)
        {
            TaskCompletionSource<object> tcs = 
                        new TaskCompletionSource<object>();

            reconnectTask.ContinueWith((t) =>
            {
                Task writeTask = WriteRowSourceToServerAsync(
                                    columnCount, 
                                    ctoken);

                if (writeTask == null) 
                {
                    tcs.SetResult(null);
                }
                else 
                {
                    AsyncHelper.ContinueTask(
                                    writeTask, 
                                    tcs, 
                                    () => tcs.SetResult(null));
                }
            }, ctoken); 

            return tcs.Task;
        }
        else 
        {
            // Trimmed for brevity, check the reference source
            // for the full method if interested.

Looking at this code, the question immediately became:

Is async/await's SynchronizationContext implicitly
preserved by TPL's ContinueWith?

The Answer

Evidently, the answer is no, TPL's ContinueWith does not implicitly preserve async/await's SynchronizationContext.

Where async/await uses SynchronizationContext, TPL uses TaskScheduler. The only way to preserve the SynchronizationContext with ContinueWith is to pass a TaskScheduler copied from the current SynchronizationContext when calling it:

await Task.Delay(TimeSpan.FromSeconds(1))
          .ContinueWith(
              (t) => 
              { 
                  // whatever you like 
              },
              // Handy helper method!
              TaskScheduler.FromCurrentSynchronizationContext());
}

Seeing as I don't own the code in which the call to ContinueWith is being made, adding the call to TaskScheduler.FromCurrentSynchronizationContext isn't really an option.

The quick fix here was to simply restore the current thread's culture to Spanish at the start of the custom reader's Read method before proceeding. (This isn't so much a fix as it is a workaround, but it achieves the desired result with minimum impact. Sometimes doing it cleanly is better than doing it "right".)

Final Thoughts

The take away from this is to remember that just because a method returns an awaitable Task doesn't necessarily mean that the Task being returned uses async/await in its implementation, it could well be using TPL and ContinueWith, in which case your SynchronizationContext won't be preserved.

If you run into an issue where async/await and an asynchronous method you didn't write aren't behaving consistently as expected, be sure to check the reference source for how the asynchronous method is implemented and proceed accordingly.

Monday, 13 April 2015

Event Handlers and C# 6's Null-Propagating Operator

I've previously waxed lyrical about the incredibly cool null-propagating operator coming in C# 6.0, but Jon Skeet recently raised an excellent point regarding event handlers.

_______________

Since the null-propagating operator can be used with method calls, what once had to be written as:

if (eventHandler != null)
    eventHandler(this, args);

Will soon be able to be written as:

eventHandler?.Invoke(this, args);

I think we can all agree that the latter is cleaner!

If you'd like to know more, then be sure to read the full post on Jon Skeet's blog and follow him if you aren't already, as he always goes into incredibly well considered depth on a topic!

Friday, 13 February 2015

Liquid for C#: Defining Success

Introduction

In the High-Level Overview for this series I mentioned that I'll need a way to measure the success of this project as it progresses.

As this project's aim is to create a one for one replica of the Ruby implementation of Liquid's behaviour, I will be porting Liquid's integration tests to C# and following a test driven approach.

What's in a Name?

Though they have been called integration tests in Liquid's code, the majority of these tests are in fact functional acceptance tests, which is what makes them useful for confirming that the behaviour of the system is correct.

Unit Test

Tests the behaviour of a single system component in a controlled environment.

Integration Test

Tests the behaviour of major components of a system working together.

Functional Acceptance Test

Tests that the system, per the technical specification, produces the expected output for each given input.

Unit and integration tests verify that the code you've written is doing what it was written to do, while functional acceptance tests verify that the system as a whole, without consideration for the structure of its internal components, does what it is designed to do.

Any Port in a Storm

There are hundreds of tests to port to C# and, as it turns out, not all of the tests in the Ruby implementation's integration namespace are integration or functional acceptance tests... some are unit tests!

The porting process is therefore a matter of replicating the original tests as faithfully as possible, translating them into functional acceptance tests where needed.

A test that ported smoothly

# Ruby
def test_for_with_range
    assert_template_result(
        ' 1  2  3 ',
        '{%for item in (1..3) %} {{item}} {%endfor%}')
end

// C#
public void TestForWithRange()
{
    AssertTemplateResult(
        " 1  2  3 ", 
        "{%for item in (1..3) %} {{item}} {%endfor%}");
}

A test that needed translation

# Ruby - The below are unit tests
#        for methods escape and h
def test_escape
    assert_equal '<strong>', @filters.escape('<strong>')
    assert_equal '<strong>', @filters.h('<strong>')
end

// C# - Rewritten as a test of the 
//      output expected from a template
public void TestEscape()
{
    AssertTemplateResult(
        "&lt;strong&gt;", 
        "{{ '<strong>' | escape }}");
}

When translating from a unit or integration test to a functional acceptance test, I'm using the documentation and wiki as the design specification. This ensures that the tested behaviour is the templating language's expected behaviour, not just the behaviour I expect!

What's Next?

Once all of the tests are ported, the next step will be to start writing the code to pass those tests. Remember, in Test Driven Development we start with failing tests and then write the code to make those tests pass.

The AssertTemplateResult method mentioned earlier currently looks like this:

protected void AssertTemplateResult(
                   string expected, 
                   string source)
{
    // TODO: implement me!
    throw new NotImplementedException();
}

There's still a few hundred more tests to port yet, though, so wish me luck!

Monday, 9 February 2015

Liquid for C#: High-Level Overview

Introduction

In the Liquid For C# series, I will be writing a C# interpretor for the Liquid templating language from scratch.

In this first post I define the project's scope and overall intention. Code does not factor into this stage at all, it's purely about the API's purpose, not it's implementation.

Broad Strokes

The first step in any project is to define what it will be doing at the highest level. Ideally, this should be expressible as a single sentence or a simple diagram.

This project's definition is deceptively simple: Template + Data = Output.

Armed with this very general definition, the next step is to break the overall process into broad, functionally cohesive chunks. I find that this is best achieved by running through potential use cases. The below is the outcome of that process.

It immediately jumps out at me that the Abstract Syntax Tree and steps that follow are implementation agnostic. This means that they are not specific to Liquid and, because of this, can be re-used in any templating language interpretor.

Defining Success

The question then becomes one of how to know when the project fulfils its purpose.

As the aim of this project is to provide a full C# implementation of Liquid's behaviour as it is currently implemented in Ruby, I will port all of the integration tests for Liquid to C# and follow a Test Driven Development approach. I will only consider the project to be a success when it passes all of the original tests.

What Next?

In bigger teams or projects its necessary to delve much deeper in the design phase, going as far as to define the interfaces for the API and how they plug together so that all involved parties can work independently without going off in completely different directions.

Since this is just me working on a hobby project, though, I'll instead be taking a very iterative approach and in the next post I'll be writing code!

Wednesday, 4 February 2015

Restructuring DotLiquid: Part 3

The Issue at Hand

For those who didn't know, DotLiquid is a straight C# port of Liquid, a library written in Ruby.

The Ruby programming language is significantly different to C#, so even best-effort attempts at like-for-like reconstruction of the library inevitably lead to structural issues in the API's design.

Lost in Translation

The structural issues that come from direct porting include:

Excessive use of static classes.
Excessive use of Reflection.
Lack of Object Oriented design, leading to inflexibility.
Duplicate code. Tight knit classes force code to be repeated.
Excessive boxing and unboxing, leading to degraded performance.

That's not to do down DotLiquid though, which is an exceptional direct port of the original library, as for the majority of cases it is more than fast enough and anyone who has written code using the Ruby implementation of Liquid will be able to pick up DotLiquid and use it in the exact same way without hesitation.

In my quest to produce the perfect API, however, my implementation has become so far removed from DotLiquid's interface, implementation and intent that I have decided to start afresh.

Be sure to come back for my next post, where I'll begin the high level design process for the API including how and why I'll be drawing distinct boundaries between its elements.

Friday, 23 January 2015

Restructuring DotLiquid: Part 2

Bringing down the Hammer

I mentioned in Part 1 that DotLiquid's Condition hierarchy could do with being a bit more object oriented.

As conditions are a relatively small and isolated part of the API, it's a great place to start this series in earnest, so that's where I'll begin.

The Restructure

Here's a before and after of the Condition class hierarchy.

BEFORE

AFTER

First, I introduced a new interface, ICondition, and I did this for two reasons:

Not all future developers will want to use the class ConditionBase as a base - they might have new code requirements or their own base class.
No class that has a dependency on conditions should be forced to depend upon a specific implementation - by using the interface I make those classes compatible with any implementation.

Next, I refactored And and Or logic out of Expression and into their own classes. I did this because the code for And, Or and Expression may be logically cohesive, but it is not functionally cohesive. Incidentally, their code's lack of functional cohesion is what made them so easy to separate.

I made ConditionBase an abstract class to better indicate its purpose as a foundation, as opposed to a class that can be used effectively on its own.

I moved the static collection Operators out of ExpressionCondition and into its own class. This needs further work, as it shouldn't be static at all, but it's a start. More on this in a later post.

The IsElse property is a classic code smell because it will only be true on one occasion: when the Type is ElseCondition. Any logic that utilises the property would be better off inside the ElseCondition itself, thereby encapsulating the functionality, so I changed the signature of the Evaluate method to take a ConditionalStatementState object and moved the check for whether an ElseCondition should render inside ElseCondition.

// BEFORE
// =====================
// The owning block's render method:
var executeElseBlock = true;
foreach (var block in Blocks)
{
    if (block.IsElse)
    {
        if (executeElseBlock)
        {
           return RenderAll(block.Attachment, context, result);
        }
    }
    else if (block.Evaluate(context))
    {
        RenderAll(block.Attachment, context, result);
        executeElseBlock = false;
    }
}

// The ElseCondition's evaluate method:
public override bool Evaluate(Context context)
{
    return true;
}

// AFTER
// =====================
// The owning block's render method:
var state = new ConditionalStatementState(context);
foreach (var block in Blocks)
{
    if (block.Evaluate(state))
    {
        ++state.BlockRenderCount;
        var retCode = block.Render(context, result);
        if (retCode != ReturnCode.Return)
            return retCode;
    }
}

// The ElseCondition's evaluate method:
public override bool Evaluate(ConditionalStatementState state)
{
    return state.BlockRenderCount <= 0;
}

It's worth noting that I could have introduced an additional base class for AndCondition and OrCondition for which they override the evaluate method and share the Left and Right properties, but they do so little internally that it felt like overkill. Should they ever grow in size, an abstract base class can be retrofitted painlessly enough.

Summary

Overall, this is a great first step on the path to a clean and pure API, but there's still a lot more work to be done. I suspect that by the end of this series DotLiquid's API will be a significantly different beast, exposing the same functionality in a much more flexible API.

I'm really enjoying the challenge and, if you'd like me to clarify anything, feel free to let me know in the comments!

Wednesday, 21 January 2015

Restructuring DotLiquid: Part 1

Introduction

In the previous series, Optimising DotLiquid, the focus was to improve rendering performance. In this series, the focus is to improve DotLiquid's API.

With DotLiquid v2.0 on the far horizon, now is the perfect time to smooth any rough edges that have appeared over the course of development and distil the API into its purest, cleanest form.

What Makes a Great API?

Accessible

For an API to be accessible requires consistency in the naming convention it uses, it's method signatures and chosen design patterns. The API should also be minimalist, exposing no more public methods, objects or functionality beyond those that drive the end user's interaction with the API.

Flexible

A great API makes as few assumptions about how it will be used as possible. Keeping class coupling to a minimum, allowing the end user to pick and choose functionality with good object oriented design and keeping class dependencies to a minimum are all part of making an API flexible.

Extensible

A great API has to be easy to extend. This means making key methods virtual, classes concise and substitutable and avoiding any behind the scenes hack-magic. The principles of SOLID really come into their own when it comes to extensibility, because you never know which direction the next developer will want to go.

A Bird's Eye View

When fine tuning an API, implementation takes a back seat to architecture. After all, we're designing the interface by which developers interact with the library to achieve a goal, not how that goal is achieved.

The quickest way to get an architectural overview is to add a class diagram to the project. Here's the class diagram for DotLiquid as it stands at the end of Optimizing DotLiquid.

This diagram tells me a lot about the state of the DotLiquid API as it currently stands.

The classes with a dotted outline are helper classes, extension classes and containers for commonly used resources. This is fine in a small project, but in an API this could be preventing a third party from tweaking core functionality. I'll be looking to see what can be refactored into instance classes that are provided as configuration to templates, improving flexibility and customisability.

The class Condition isn't respecting the Single Responsibility Principle. It currently has the responsibilities of evaluating an expression, evaluating an AND condition and evaluating an OR, too. ElseCondition and the property IsElse aren't the OOP ideal, either, so refactoring of the condition hierarchy will yield benefits for Extensibility.

The down arrow marked against quite a few of the methods in this diagram indicates the use of the internal access modifier. In the places that it's been used, it would appear that these methods are being used as back door access to functionality that isn't exposed publicly. This is a code smell that harms extensibility and may indicate deeper structural issues, so I'll be looking to do away with them completely.

The Tag class and associated hierarchy has a wide, shallow inheritance structure that is self-explanatory. This is an example of great Object Oriented Design. Other than a few public and internal methods I'd like to clean up, I doubt there's much work to be done to the already clean, accessible signature seen here.

What's Next?

In the next post of this series I'll single out an area of DotLiquid's architecture that could use improvement, explain why such improvements are needed and then implement the changes with before and after class diagrams...

It's going to be awesome!

Friday, 16 January 2015

Optimising DotLiquid: Part 6

Exceptions Should Be Exceptional

A recent merge into the main DotLiquid repository added support for the keywords continue and break using exceptions and try/catch further up the render chain for flow control.

There are two significant reasons why exceptions should never be used for controlling the flow of a program:

Exceptions have a major performance impact.
It is not obvious without serious digging where those exceptions will be handled or if they will be handled at all.

Having this exception-based implementation is better than not having an implementation, of course, but in this series' war on performance it is an obvious target.

Optimisation

The concept for this optimisation is simple enough: to replace the flow controlling exceptions and try/catch handling with proper program flow.

You can see the full changeset on GitHub, but below is a brief summary of the changes.

The break tag class

// BEFORE
// ==================================
public class Break : Tag
{
    public override void Render(
                            Context context, 
                            TextWriter result)
    {
        throw new BreakInterrupt();
    }
}

// AFTER
// ==================================
public class Break : Tag
{
    public override ReturnCode Render(
                                Context context, 
                                TextWriter result)
    {
        return ReturnCode.Break;
    }
}

Shared - the RenderAll loop body

// BEFORE
// ==================================
if (token is IRenderable)
    ((IRenderable) token).Render(context, result);
else
    result.Write(token.ToString());

// AFTER
// ==================================
var renderable = token as IRenderable;
if (renderable != null)
{
    var retCode = renderable.Render(context, result);
    if (retCode != ReturnCode.Return)
        return retCode;
}
else
    result.Write(token.ToString());

The for tag class - break and continue handling

// BEFORE
// ==================================
try
{
    RenderAll(NodeList, context, result);
}
catch (BreakInterrupt)
{
    break;
}
catch (ContinueInterrupt)
{
}

// AFTER
// ==================================
if (RenderAll(NodeList, context, result) == ReturnCode.Break)
    break;

A quick re-run of all of the Unit Tests tells me that this far-reaching changeset has maintained all the original expected behaviours.

Initial Timings

The below timings were all taken during the same time period on the same machine. They are based on 10,000 iterations per test.

Render Time (ms)	Part 5	With New Flow Control
Minimum	6.631110	5.78710
Maximum	8.65750	7.61880
Range	2.02640	1.83170
Average	6.87194	5.99984
Std. Deviation	0.20780	0.18964

Summary

Simply avoiding the anti-pattern of using exceptions to control program flow has reduced render time by more than 10%.

A minor side-effect of updating the program flow in this way is that anyone who has written their own tags will need to make the following changes:

The return Type of the Render method is now ReturnCode.
Wherever return is used, return ReturnCode.Return instead.
Whenever the result of RenderAll is not ReturnCode.Return, return that result immediately. (Example in Block.RenderAll)

This side effect only affects developers who have created their own tags, anyone downloading and using the library as-is will enjoy increased performance without having to make any changes.

Thursday, 15 January 2015

Optimising DotLiquid: Part 5

A New Frontier

The Optimising DotLiquid series is so far off to a very promising start. With the core classes significantly improved, its now time to optimise the DotLiquid featureset as a whole, and that means I need a new test template.

The New Template

The new template is designed to incorporate all flow control, iteration and variable tags. A number of different parameter variations for each tag have also been included.

It's ugly, but it does the job:

{% for x in (1..5) %}
<h1>Tests all except filters</h1>
Also doesn't use INCLUDE or EXTENDS, to be tested later
<div>
<h2>Variable Tags</h3>
<h3>Assign</h3>
{% assign handle = 'cake' -%}
{{ handle }}
<h3>Capture</h3>
{% capture my_variable %}I am being captured.{% endcapture -%}
{{ my_variable }}
</div>
<div>
<h2>Control Flow Tags</h2>
<h3>Case (non-else)</h3>
{% case handle -%}
  {% when 'cake' -%}
     This is a cake
  {% when 'cookie' -%}
     This is a cookie
  {% else -%}
     This is not a cake nor a cookie
{% endcase -%}
<h3>Case (else)</h3>
{% case handle -%}
  {% when 'a' -%}
     This is a cake
  {% when 'b' -%}
     This is a cookie
  {% else -%}
     The else statement was reached
{% endcase -%}
<h3>If equals (non-else)</h3>
{% if user.name == 'Steve Jackson' -%}
  Equals failed on match
{% elsif user.name == 'Steve Lillis' -%}
  Equals was a success
{% else -%}
  Equals failed to else
{% endif -%}
<h3>If not equals (non-else)</h3>
{% if user.name != 'Steve Jackson' -%}
  Not equals was a success
{% else -%}
  Not equals failed
{% endif -%}
<h3>If (else)</h3>
{% if user.name == 'Steve Jackson' -%}
  Unexpected user
{% else -%}
  Else body reached
{% endif -%}
<h3>Unless</h3>
{% unless user.name == 'Steve Jackson' -%}
  Unless worked
{% else -%}
  Unless failed
{% endunless -%}
</div>
<div>
<h2>Iteration Tags</h2>
<h3>For (with cycle)</h3>
{% for item in user.items -%}
 {% cycle 'one', 'two', 'three' %}: {{ item.description }} 
{% endfor -%}
<h3>For (reversed)</h3>
{% for item in user.items reversed -%}
 {% cycle 'one', 'two', 'three' -%}: 
        {% if item.description == 'First Item' -%} 
  {{ item.description | upcase -}} 
 {% else -%} 
  {{ item.description -}} 
 {% endif -%}
{% endfor -%}
<h3>For (Limit: 2)</h3>
{% for item in user.items limit:2 -%}
 {% cycle 'one', 'two', 'three' %}: {{ item.description }} 
{% endfor -%}
<h3>For (Offset: 2)</h3>
{% for item in user.items offset:2 -%}
 {% cycle 'one', 'two', 'three' %}: {{ item.description }} 
{% endfor -%}
<h3>For Range</h3>
{% for i in (1..4) -%}{{ i -}},
{% endfor -%}
<h3>For Range (Continue on 2)</h3>
{% for i in (1..4) -%} {% if i == 2 %} {% continue %} 
{% endif %} {{ i -}},
{% endfor -%}
<h3>For Range (Break on 2)</h3>
{% for i in (1..4) -%} {% if i == 2 %} {% break %} 
{% endif %} {{ i -}},
{% endfor -%}
<h3>Table Row (Cols:2, Limit:4)</h3>
<table>
{% tablerow item in user.items cols:2 limit:4 %}
  {{ item.Description }}
  {{ item.Cost }}
{% endtablerow %}
</table>
</div>
{% endfor %}

Initial Timings

The below timings were all taken during the same time period on the same machine. They are based on 10,000 iterations per test.

Render Time (ms)	Original Code	With Optimisations
Minimum	15.89480	6.52180
Maximum	20.14610	8.96140
Range	4.25130	2.43960
Average	16.35774	6.76287
Std. Deviation	0.42730	0.20170

Summary

The optimisations made thus far have had a distinct impact but ~6.8 milliseconds to render is still much higher than I'd like, especially if a reduced render time is only a few minor improvements away.

In the next post ill be overhauling how the break and continue keywords are implemented in DotLiquid, as I've noticed that they're currently using exceptions to control program flow.

Wednesday, 14 January 2015

Optimising DotLiquid: Part 4

Low Hanging Fruit

I've already snagged some low hanging fruit in the Optimising DotLiquid series by avoiding rework and respecting regex. DotLiquid's rendering is now more than twice as fast.

There's still more gains for the taking, though, and now that I'm familiar with the codebase I've turned my sights squarely on the Hash class.

The Hash class is used everywhere in DotLiquid for storing and retrieving the current scope's data, so even a small performance gain should have a large impact.

Optimisations

I replaced occasions of checking a Dictionary for a key before getting the value with a single call to TryGetValue.

// BEFORE
// ==================================
if (_nestedDictionary.ContainsKey(key))
    return _nestedDictionary[key];

// AFTER
// ==================================
object result;
if (_nestedDictionary.TryGetValue(key, out result))
    return result;

The class For, which renders loop blocks, used the reflection-based method Hash.FromAnonymousObject in every iteration. I avoided the overhead of reflection by setting the values directly instead.

// BEFORE
// ==================================
context["forloop"] = Hash.FromAnonymousObject(new
{
    name = _name,
    length = length,
    index = index + 1,
    index0 = index,
    rindex = length - index,
    rindex0 = length - index - 1,
    first = (index == 0),
    last = (index == length - 1)
});

// AFTER
// ==================================
var forHash = new Hash();

forHash["name"] = _name;
forHash["length"] = length;
forHash["index"] = index + 1;
forHash["index0"] = index;
forHash["rindex"] = length - index;
forHash["rindex0"] = length - index - 1;
forHash["first"] = (index == 0);
forHash["last"] = (index == length - 1);

context["forloop"] = forHash;

A few performance critical paths cast the same object to the same Type more than once. I replaced double casts with the as operator and a null check.

// BEFORE
// ==================================
if ((obj is IIndexable) 
    && ((IIndexable) obj)
            .ContainsKey((string) part))
    return true;

// AFTER
// ==================================
var indexable = obj as IIndexable;
if (indexable != null 
    && indexable.ContainsKey((string) part))
    return true;

The frequently visited IDictionary.this[object key] implementation in Hash checks the object's Type is string and throws an exception if not. Since the subsequent cast to string will throw an InvalidCastException under that circumstance anyway, I removed the check to improve performance.

// BEFORE
// ==================================
if (!(key is string))
    throw new NotSupportedException();
return GetValue((string) key);

// AFTER
// ==================================
return GetValue((string) key);

I removed an unnecessary null-check in the performance critical method Hash.GetValue.

// BEFORE
// ==================================
if (_defaultValue != null)
    return _defaultValue;

return null;

// AFTER
// ==================================
return _defaultValue;

I avoided assigning values retrieved from the Hash back into the Hash.

// BEFORE
// ==================================
context.Registers["for"] = context.Registers["for"] 
                         ?? new Hash(0);

// AFTER
// ==================================
object forRegister = context.Registers["for"];
if (forRegister == null)
{
    forRegister = new Hash(0);
    context.Registers["for"] = forRegister;
}

Timings

The below timings were all taken during the same time period on the same machine. They are based on 10,000 iterations per test.

Render Time (ms)	Original Code	Part 3 Code	Hash Improvements
Minimum	3.42810	1.50950	1.33320
Maximum	5.76840	2.99220	2.60190
Range	2.34030	1.48270	1.26870
Average	3.61269	1.56977	1.38641
Std. Deviation	0.17960	0.07936	0.06621

Summary

DotLiquid's rendering is now two and a half times faster. It's worth noting too that due the nested nature of rendering, as a template grows in size and complexity the performance savings will grow exponentially.

The next step now is to put together a mega-template that uses every single DotLiquid feature, then improve performance even further!

Tuesday, 13 January 2015

Optimising DotLiquid: Part 3

Respecting Regex

Regular expressions are a powerful tool. They're so powerful, in fact, that it's easy to get carried away.

As you might expect from a solution that relies on parsing raw text into program flow, DotLiquid uses Regex extensively. In this part of the Optimising DotLiquid series, I'll be improving how DotLiquid works with Regex.

The Optimisation

The original code for DotLiquid uses the static method Regex.Match extensively to determine whether or not given inputs match various regular expressions.

Regex.Match checks for a cached instance of the Regex class for the given expression and creates one if it doesn't exist. Regex.Match then returns the result of invoking the instance's Match method with the original input.

The default cache size for Regex.Match is 15, so more than 15 expressions being used in calls to Regex.Match across the application will result in unnecessary work recreating instances that were created previously but then dumped.

It's also worth noting that when trying to save fractions of a millisecond, as I am in this series, repeatedly looking up a cached value is wasteful.

Using static references to pre-compiled Regex instances instead of relying on Regex.Match to handle caching and persistence completely sidesteps both of these issues.

An example can be seen below and you can view the full changeset on GitHub.

// BEFORE
// ==================================
private object Resolve(string key)
{
    [...]
    var match = Regex.Match(R.Q(@"^'(.*)'$"), key);
    if (match.Success)
        return match.Groups[1].Value;
    [...]

// AFTER
// ==================================
private static Regex _singleQuotesRegex 
    = new Regex(R.Q(@"^'(.*)'$"), RegexOptions.Compiled);

private object Resolve(string key)
{
    [...]
    var match = _singleQuotesRegex.Match(key);
    if (match.Success)
        return match.Groups[1].Value;
    [...]

Check out this excellent article on compiling Regex for more information on when and when not to use compiled Regex in C#.

Timings

The below timings were all taken during the same time period on the same machine. They are based on 10,000 iterations per test.

Render Time (ms)	Original Code	Part 2 Code	RegexOptions.Compiled
Minimum	3.46160	3.35360	1.47660
Maximum	5.74940	5.29750	2.90560
Range	2.28780	1.94390	1.42900
Average	3.64991	3.54096	1.53506
Std. Deviation	0.16530	0.17852	0.07369

Analysis

With these changes DotLiquid rendering is now more than twice as fast!

The average time to render has been reduced to just 42% of the original average render time, with very little impact on the readability and memory usage of the code. It's a definite win.

What's Next?

There might be places where Regex isn't needed at all and simple string operations would be a better fit so I'll be reviewing the code for such opportunities.

For the next part of this series I'll also be looking at how DotLiquid rendering handles loops, cycles, assignments and case statements and potentially cutting similar performance corners as I did in Optimising DotLiquid: Part 2.

Stay tuned!

Monday, 12 January 2015

Optimising DotLiquid: Part 2

What to Improve First?

Choosing where to focus optimisation attention is all about identifying performance hotspots. Hotspots are code paths with a single, long execution time or paths that have short execution times but are visited frequently.

Using a simple divide and conquer approach, I broadly identified the areas that made up the majority of the ~4ms of rendering in the initial timings.

The biggest costs came from the main infrastructure class Context which manages the template's memory stack while it is rendered and the class Condition which evaluates the if and unless statements that are used frequently in the test template.

First Pass of Optimisations

In this pass I focused on making small, simple changes that eliminate unnecessary rework occurring within some of the most frequently called methods. I re-ran the tests after each change was applied to ensure that no individual change was raising the execution time back up. In the cases that this did occur, I undid the change.

I've provided the changes and relevant example code below and you can see the full changeset on GitHub.

The Condition Class

I amended the class to keep a reference to the operation delegate whenever the operation string is set instead of every time the condition is evaluated, reducing the number of times the lookup occurs from thousands to just a handful.

// BEFORE
// ==================================
private static bool InterpretCondition(
                            string left, 
                            string right, 
                            string op, 
                            Context context)
{
        if (string.IsNullOrEmpty(op))
        {
            object result = context[left];
            return (result != null 
                    && (!(result is bool) 
                        || (bool) result));
        }

        var leftObject = context[left];
        var rightObject = context[right];

        ConditionOperatorDelegate opDelegate;
        if (!Operators.TryGetValue(
                            op, 
                            out opDelegate))
            throw new Exceptions.ArgumentException(...);

    return opDelegate(leftObject, rightObject);
}

public virtual bool Evaluate(Context context)
{
    context = context ?? new Context();
    bool result = InterpretCondition(
                            Left, 
                            Right, 
                            Operator, 
                            context);
    ...

// AFTER
// ==================================
private string _operatorString;
private ConditionOperatorDelegate _operatorDelegate;

private string Operator
{
    get { return _operatorString; }
    set
    {
        _operatorString = value;
        if (string.IsNullOrEmpty(value))
            _operatorDelegate = (l, r) => NoOperator(l);
        else if (!Operators.TryGetValue(
                                value, 
                                out _operatorDelegate))
            throw new Exceptions.ArgumentException(...);
    }
}

public virtual bool Evaluate(Context context)
{
    // The whole InterpretCondition method is avoided
    var result = _operatorDelegate(
                        context[_left], 
                        context[_right]);
    ...

I replaced comparisons against arbitrary strings "and" and "or" with byte comparisons against constants fields named And and Or.

// BEFORE
// ==================================
switch (_childRelation)
{
    case "and":
         return result && _childCondition.Evaluate(context);
    case "or":
         ...

// AFTER
// ==================================
if (_childRelation == And) // And is a const byte value
    return result && _childCondition.Evaluate(context);

if (_childRelation == Or) // Or is a const byte value
    ...

I replaced occurrences of double casting with a single as operator and null check.

// BEFORE
// ==================================
    if (left is Symbol)
        return ((Symbol)left).EvaluationFunction(right);
    ...

// AFTER
// ==================================
    var symbolLeft = left as Symbol;
    if (symbolLeft != null)
        return symbolLeft.EvaluationFunction(right);
    ...

The Context Class

I stored a reference to the first and last scopes in the context's list of scopes and replaced all lookups with the new references.

// BEFORE
// ==================================
    context.Scopes.Last()[_to] = _from.Render(context);
    ...
    var orphanedBlocks = 
        ((List<Block>)context.Scopes[0]["extends"]) 
               ?? new List<Block>();
    ...

// AFTER
// ==================================
    context.GlobalScope[_to] = _from.Render(context);
    ...
    var orphanedBlocks = 
        ((<Block>)context.LocalScope["extends"]) 
               ?? new List<Block>();
    ...

Timings

The timings seen below were all taken during the same time period on the same machine. They are based on 10,000 iterations per test.

	Before Optimisations	After Optimisations
Minimum (ms)	3.45680	3.36080
Maximum (ms)	4.95130	4.97570
Range (ms)	1.49450	1.61490
Average (ms)	3.64347	3.54507
Standard Deviation (ms)	0.16981	0.18134

Analysis

A reduction in average execution time of ~0.1ms may not seem like a lot, but it represents a 3% reduction in the total time taken to render, which is a sizeable chunk.

It's worth noting that micro-optimisation is not often required, but in the case of trying to bring down a ~4ms execution time, micro-optimisations are exactly what I needed!

Summary

The careful adjustments made during this pass had a small but notable impact and were a great way to get used to the DotLiquid codebase.

I expect to make a huge performance boost in the next pass of optimisations by replacing all the uses of static method Regex.Match with pre-compiled Regex instance equivalents.

Exciting stuff!

Friday, 9 January 2015

Optimising DotLiquid: Part 1

Introduction

A colleague recently introduced me to a very interesting open source library for content templating called DotLiquid. It's actually a .NET port of a system originally produced for Ruby, and that system originated from Shopify! This kind of legacy is good, because it (usually) means that an API has stood the test of time for being fast and easy to work with.

Intrigued, I downloaded the source for DotLiquid and was pleased to find that not only is the provided API easy to understand and work with without knowing the underlying code, but the source itself has a very well structured Object Oriented architecture, too.

Digging a bit deeper I found a few places that can potentially be improved for better performance. How big the performance gain will be depends on the size and complexity of the content being passed in to be rendered.

How important that performance gain will be really depends on how time critical your rendering of content is. If you're using the templating system for rarely updated web page content, the time saved by these changes will likely be negligible, but if you're rendering a thousand different content pieces in quick succession using the same template then every millisecond saved per render is a whole second saved per batch.

Templates in DotLiquid can be parsed and cached in advance but rendering needs to occur for each piece of content using that template, so in this series I'll be working on improving the performance of DotLiquid's render method specifically.

Testing Performance: Never Guess

The most important thing to do when making efforts to optimise code is to build a reliable way to take before and after performance measurements. By taking measurements in this way, it can be said for certain whether a change to the source had a positive, negative or negligible impact on how it performs.

As the ambition of this series is to improve the speed at which DotLiquid renders content, I have written a simple method for taking timings as my performance metric.

static void RunTest(int iterations)
{
    var template = Template.Parse(TemplateCode);

    var stopwatch = new Stopwatch();
    var timings = new List<double>();
            
    for (var i = 0; i < iterations; ++i)
    {
        var hash = Hash.FromAnonymousObject(GetFreshTestObject());

        stopwatch.Reset();
        stopwatch.Start();

        template.Render(hash);

        stopwatch.Stop();

        timings.Add(stopwatch.Elapsed.TotalMilliseconds);
    }

    Console.WriteLine(@"Iterations: {0}", iterations);

    Console.WriteLine(@"   Average: {0:0.00000}ms", 
                      timings.Average());

    Console.WriteLine(@"   Std Dev: {0:0.00000}ms", 
                      CalculateStdDev(timings));

    Console.WriteLine(@"   Minimum: {0:0.00000}ms", 
                      timings.Min());

    Console.WriteLine(@"   Maximum: {0:0.00000}ms", 
                      timings.Max());

    Console.WriteLine();
}

static void Main()
{
    // Warm up
    Console.WriteLine("Warm up");
    RunTest(100);

    // Real tests
    Console.WriteLine("Real tests");
    RunTest(10000);
    RunTest(1000);
            
    Console.ReadKey();
}

You may notice that I run a warm up test first. C#'s JIT compiler has to compile the code on first execution and I don't want the extra time taken to compile to impact the timings. The warm up run gets JIT compilation out of the way.

To ensure that performance testing results are as reliable as possible, I'll be running the Release build because it is properly optimised and I'll be running it outside of the IDE to remove the IDE's performance impact.

The Test Template

I'll be using the same DotLiquid template throughout the series to keep the parameters of the test consistent. I've used a variety of tags so that optimisations that only affect one tag type will still be included in the timings.

<div>
<p><b>
{% if user.name == 'Steve Lillis' -%}
  Welcome back
{% else -%}
  I don't know you!
{% endif -%}
</b></p>
{% unless user.name == 'Steve Thompson' -%}
  <i>Unless example</i>
{% endunless -%}
{% comment %}A comment for comments sake{% endcomment %}
<ul>
<li>This entry and something about baked goods</li>
<li>
{% assign handle = 'cake' -%}
{% case handle -%}
  {% when 'cake' -%}
     This is a cake
  {% when 'cookie' -%}
     This is a cookie
  {% else -%}
     This is not a cake nor a cookie
{% endcase -%}
</li>
</ul>
</div>
<p>{{ user.name | upcase }} has the following items:</p>
<table>
{% for item in user.items -%}
  <tr>
     <td>
        {% cycle 'one', 'two', 'three' %}
     </td>
     <td>
        {{ item.description }} 
        {% assign handle = 'cake' -%}
        {% case handle -%}
          {% when 'cake' -%}
             This is a cake
          {% when 'cookie' -%}
             This is a cookie
          {% else -%}
             This is not a cake nor a cookie
        {% endcase -%}
     </td>
     <td>
        {{ item.cost }}
     </td>
  </tr>
{% endfor -%}
{% for item in user.items reversed -%}
  <tr>
     <td>
        {% cycle 'one', 'two', 'three' %}
     </td>
     <td>
        {% if item.description == 'First Item' -%}
            {{ item.description | upcase }}
        {% else %}
            {{ item.description }}
        {% endif %}
     </td>
     <td>
        {{ item.cost }}
     </td>
  </tr>
{% endfor -%}
</table>

Initial Timings

Iterations:	1000	10000
Minimum (ms)	3.44020	3.42810
Maximum (ms)	8.25000	7.02620
Range (ms)	4.80980	3.59810
Average (ms)	3.95568	3.76747
Standard Deviation (ms)	0.41129	0.23009

Early Analysis

I only have the initial timings, having made no changes to the code yet, but there's already some points of note.

The standard deviation for 1k and 10k iterations only differs by a fairly negligible ~0.2ms, but the maximum time any one iteration took differs by ~1.2ms. This discrepancy combined with a preliminary examination of DotLiquid's code suggests that excessive object allocation on the heap could be triggering garbage collection, which I'd want to avoid if I can.

The average in both cases is a lot closer to the minimum than the maximum, even when taking the standard deviation into account. This could imply that the first time a DotLiquid template is rendered it caches values to save time on subsequent renders. It might be possible to improve initial render performance by moving the caching out of rendering and into parsing or even compile time. A very brief look over the codebase reveals a few Regexes that aren't pre-compiled, there may be similar savings elsewhere too.

The difference in average between 1k and 10k is just ~0.2ms, hardly a difference at all. This tells me that the render method has been built with consideration to being called multiple times on a single template object, so I likely won't have to make many improvements to how the render method cleans up.

Going Forward

That's it for now. In the next post in this series I'll be investigating the DotLiquid codebase, making changes and hopefully presenting some improved timings. Wish me luck!

Tuesday, 30 December 2014

The When and Why: Properties and Fields

Fields and Properties are two important concepts that come up very early on when learning to code. Due to their surface level similarities it can be a confusing process understanding when to use each of them and why.

When?

Only ever use fields with the private access modifier. They can be used with protected or public but it's generally poor practice and I'll explain why later.

Use properties for all non-private scenarios, even if the properties are auto-implemented. You can also use them in a private context when implementing get and set logic.

public class ExampleClass
{
    // This particular field is used as a 'backing' field
    // for PropertyWithLogic
    private int _myField;

    // protected, so a property, even though its an auto-property
    protected string AutoProperty { get; set; } 

    // private but has logic, so a property
    private int PropertyWithLogic
    {
        get { return _myField; }
        set { _myField = value >= 0 ? value : 0; } 
    }
}

Why?

Good Object Oriented Programming is all about encapsulation, separating the interface of a class from its implementation. Interfaces are the blueprints for what a class should do, Classes are the how it does it.

The signature and implementation of a field are inseparable. That is to say, there is no separation of the what of a field (getting and setting of a value) from the how of a field (storing the value in memory). For this reason, fields are an implementation detail and can't be part of an interface.

Properties, on the other hand, separate signature from implementation. In the below example, the interface specifies that implementations must provide a property that you can get and it's the classes that define how that property is implemented.

public interface IExample
{
    string Description { get; }
}

public class Example1 : IExample
{
    // Returns a hard-coded string.
    public string Description
    {
        get { return "Example1"; }
    }
}

public class Example2 : IExample
{
    private int _id;
    private DateTime _createdDate;

    // Performs some logic to return a string
    // that includes some member field values.
    public string Description
    {
        get 
        { 
            return string.Format(
                        "Example2, id: {0}, created: {1}",
                        _id,
                        _createdDate); 
        }
    }
}

public class Example3 : IExample
{
    // Not implemented. Throws an exception when accessed.
    public string Description
    {
        get 
        { 
            throw new NotImplementedException("Todo!");
        }
    }
}

public class Example4 : IExample
{
    // Auto-property implementation, will return whatever
    // we set the value to.  Value is stored in an
    // automatically generated backing field.
    public string Description { get; set; }
}

Keeping interfaces separate from implementation in this way is a critical step in keeping the code you are writing from collapsing under its own weight over time.

An additional reason to use public automatic properties rather than public fields is that should you later need to add logic to the getter or setter, swapping from a field to a property is a breaking change for serialization as well as being a breaking change for backwards compatibility between dependent assemblies. Taking the time to do it right now will save you unnecessary difficulty when the application has grown or live data is involved.

Wednesday, 10 December 2014

The When and Why: Dependency Injection

Dependency Injection is one form of Inversion of Control - a collection of programming patterns focused on minimising your classes' direct reliance on each other, known as their coupling.

In its simplest form Dependency Injection just means to provide an object its dependencies instead of having the object create them for itself.

When?

In any class where you use the new keyword to instantiate another class, you should think about using Dependency Injection instead. It may feel like overkill at first but the benefits are big and get bigger as the application grows in size.

Here's a real world before and after example and I'll explain the why of it straight after.

BEFORE

public class CustomerService
{
    private const string SqlConnectionString = 
         @"server=localhost;username=admin;password=password;";

    private const string LoggingFilePath = 
         @"C:\logs\MyApplication.txt";

    private readonly ILogger _logger;
    private readonly IDataContext _context;

    public CustomerService()
    {
        // In this example all of the services create
        // their own logger and data context instance when
        // they are created.
        // The classes FileLogger and SqlDataContext are this
        // class's 'dependencies' because they are needed in 
        // order for this class to compile.
        _logger = new FileLogger(LoggingFilePath);
        _context = new SqlDataContext(SqlConnectionString);
    }

    public IEnumerable<Customer> GetCustomers(int accountID)
    {
        // TODO: Use logger to log
        // TODO: Use context to get customers for account ID
    }
}

public static class Application()
{
    public static void Main()
    {
        var service = new CustomerService();
        foreach (var customer in service.GetCustomers(55))
        {
            Console.WriteLine(customer.Name);
        }
    }
}

AFTER

public class CustomerService
{
    private readonly ILogger _logger;
    private readonly IDataContext _context;

    public CustomerService(IDataContext context, ILogger logger)
    {
        // In this example instead of instantiating the 
        // dependencies themselves, the service classes like 
        // CustomerService expect them to be passed in.
        // Taking them via the constructor is known as 
        // 'Constructor Injection' and is the preferred method
        // of Dependency Injection because it forces the
        // developer to provide the right dependencies before
        // the class can be used.
        _logger = logger;
        _context = context;
    }

    public IEnumerable<Customer> GetCustomers(int accountID)
    {
        // TODO: Use logger to log
        // TODO: Use context to get customers for account ID
    }
}

public static class Application()
{
    private const string SqlConnectionString = 
         @"server=localhost;username=admin;password=password;";

    private const string LoggingFilePath = 
         @"C:\logs\MyApplication.txt";

    public static void Main()
    {
        // This is messier than it was but the mess is now in
        // a single location which can be easily tightened up 
        // using a DI framework such as Ninject or Unity.
        var logger = new FileLogger(LoggingFilePath);
        var context = new SqlDataContext(SqlConnectionString);
        var service = new CustomerService();
        foreach (var customer in service.GetCustomers(55))
        {
            Console.WriteLine(customer.Name);
        }
    }
}

Why?

There are many convincing reasons to follow the Dependency Injection pattern, I'll cover a few of them here.

Implementation Agnosticism

The before example enforces the use of SqlDataContext and FileLogger. It doesn't need those specific implementations to work properly, it actually just needs something that implements IDataContext and something that implements ILogger in order for it to perform its responsibility of getting the customers, but the implementation in before has specified a concrete implementation anyway.

If we wanted one Customer Service in our application to log to a file and another to log to a database, the pattern in before would force us to either copy paste the whole class to make this small change or start specifying different enums or booleans to pick a context Type on the constructor and, as a result, would seriously harm the scalability of the application by coupling this class with every implementation it could possibly be used with.

The Customer Service in the after example is much more flexible because it does not arbitrarily restrict us to working with certain implementations of the interface. The class definition is also much clearer about exactly what the Customer Service needs in order to function correctly and also what it's purpose is: Getting customers from any given IDataContext and logging about it.

Avoiding unnecessary usage restrictions and minimising Class Coupling like this is the key to developing maintainable systems that don't become exponentially more complicated to work with as the codebase grows.

Single Responsibility Principle

Having a single responsibility makes the class more likely to be reusable, more accessible and more easily tested. Beyond making the programmer's life easier there's a structural benefit to the Single Responsibility Principle too.

CustomerService Responsibilities Before	CustomerService Responsibilities After
Create Log File Connection	Getting the Customers
Create Database Context
Getting the Customers

We can't open the same file twice, so to share a FileLogger between classes using the approach in before we'd have to make the logger available as a public static. If we ever need to change the SQL connection string for the application, we'd have to make sure we update it in all locations or make the connection string a static variable, too.

You might be tempted to resolve these sorts of issues by making everything static and publicly available but making everything public and static is an anti-pattern and doesn't resolve the issue anyway, it just hides the issue long enough for you to write a few thousand more lines of code before reaching a scenario it can't handle.

An example of such a scenario would be needing the logger and data context to live longer than a customer service but not forever and not globally across the application - for the duration of an individual web request, for instance.

Managing the lifespan of class instances as a separate responsibility and injecting them into their dependents keeps code from being bogged down under ever-changing business requirements.

Unit Testing

Unit Testing is the act of testing individual units of functionality in your code to prove that specific expectations are met. For example, a useful Unit Test for CustomerService would be GetCustomers Gets Only Customers For Provided Account, where we test to ensure that the implementation only retrieves customers with a matching account ID.

Because we don't control which ILogger and IDataContext implementations the Customer Service gets in the before example, if we were to test GetCustomers then we'd be including the workings of SqlDataContext and FileLogger within that test too. If the SQL database has the wrong data or FileLogger had a bug, then the GetCustomers Gets Only Customers For Provided Account test would fail, even if the core logic of GetCustomers is correct.

A good Unit Test tests a small, specific unit of code. In the after example, we can pass in whatever implementations of ILogger and IDataContext best suit our needs. In the real code, we provide a SqlDataContext and FileLogger. In the Unit Test, however, we can provide an implementation of ILogger that does nothing and an IDataContext implementation that provides a specific set of records for testing whether GetCustomers does its job properly given that data.

Unit Testing is very much a topic in its own right. It's well worth getting an understanding.

A Challenge

See if you can write an entire application where the new keyword is used in only one class and all other classes have their dependencies injected via their constructor!

Tuesday, 2 December 2014

Useful LINQ Extensions You Might Not Know

If you've used LINQ for any period of time then you've no doubt come to appreciate the power, abstraction of responsibilities and expressiveness of it. You'll be used to extension methods such as Select, Where, Group and so on, but there are a few gems in the System.LINQ library that you just don't know until you know.

SelectMany

No doubt you're familiar with the Select extension method. SelectMany is similar, but produces a flat result set instead of a relational one. So when you would ordinarily work with Select like so:

public class Parent
{
    public string Name { get; set; }
    public List<string> Children { get; set; }
}

// The below gives us a list of lists...
var children = parents.Select(s => s.Children);

// ...which means going through them like this:
foreach (var childGroup in children)
{
    foreach (var child in childGroup)
    {
        // Do work here
    }
}

You can, instead, use SelectMany to flatten the results!

var children = parents.SelectMany(s => s.Children);

foreach (var child in children)
{
    // Do work here
}

If you need access to the parent as well as the child for every row like a SQL join, you can add a result selector:

var relationships = 
    parents.SelectMany(
               s => s.Children, 
               (parent, child) => new { parent, child });

foreach (var relationship in relationships)
{
    // Do work here
    // i.e. relationship.parent.Name or relationship.child
}

Cast

If you've ever needed to cast the elements of an enumerable to a different Type, you've probably written it like this:

var specificEnumerable = originalEnumerable
                            .Select(s => (SpecificType)s);

You can write this more expressively using the Cast method:

var specificEnumerable = originalEnumerable
                            .Cast<SpecificType>();

OfType

Cast is all well and good so long as the items in the enumerable are all of the same Type, but there are times when you have a mixture of Types and only want the items of a particular Type. You may be tempted to write the following:

var onlySpecificEnumerable = originalEnumerable
                                .Where(w => w is SpecificType)
                                .Select(s => (SpecificType)s);
// or, perhaps:
var onlySpecificEnumerable = originalEnumerable
                                .Select(s => s as SpecificType)
                                .Where(w => w != null);

The OfType extension performs the same filter plus cast action as the above, but with cleaner syntax and less room for mistakes!

var onlySpecificEnumerable = originalEnumerable
                                .OfType<SpecificType>();

Wednesday, 26 November 2014

The When and Why: Static

The static modifier, much like the access modifiers, is one of the first keywords that new developers learn and, much like the public access modifier, static is often misused by new developers because it provides a seemingly easy shortcut around learning how to write proper Object Oriented code.

Excessive use of the static modifier can lead to messy, overly complex code. On the other hand, careful use of the static modifier can lead to functionality that is elegant, scalable and performs well. The aim of this article is to help new developers strike the right balance. First I'll describe when to use static, then I'll explain why.

When?

Ideally, static should only be used for private or protected members when you need to share a single value between all instances of the class to which those members belong. You should avoid using public static wherever possible.

There are a couple of occasions where a public static is required, however.

Providing Common Values for structs

If you've ever used string.Empty or int.MinValue then you've invoked a public static property. Here's an example of using the same technique in a Point struct.

public struct Point
{
    public int X;
    public int Y;

    // Static property that returns an instance of point
    public static Point Zero 
    {
        return new Point { X = 0, Y = 0 };
    }
}

// This...
var point = Point.Zero; 

// ...is cleaner and more descriptive than writing this...
var point = new Point { X = 0, Y = 0 };

// ...every time we want a zero value point

Extension Methods

Extension methods are a way to add new functionality to classes that you don't have access to the code for. The design pattern for extension methods requires that you use the static modifier on a public method.

public static class StringExtensions
{
    public static string Left(this string str, int count)
    {
        return str.Length <= count 
               ? str 
               : str.SubString(0, count);
    }
}

// We use the extension method just like any normal method
var example = "Hello world".Left(5);

Why?

Over-use of the static modifier is an anti-pattern because it increases and actually encourages class coupling. Code that features classes that reference each other excessively is often referred to as spaghetti code because of how difficult it is to untangle and understand.

When all of your classes know about each other in this way they become less reusable and they also become harder to maintain. After all, it's much easier for you as a programmer to understand and reuse a class that only knows about itself than it is to understand one that accesses eight or nine other classes, each of which accesses another eight, and so on.