Monday, April 30, 2012

Fail-Safe Logging

It is very rare nowadays to find a system without some kind of logging. When using built-in Trace or third party libraries, developers assume:

  1. Logging is side-effect free to anything other than logging system itself
  2. Logging is fail-safe

The first assumption is true when using default file-based log writer, but may fail with other implementations (topic for another post). The second assumption is also true for most cases. Let's analyze a few scenarios in which the second assumption fails.

Consider the following pseudo code:
void MyMethod() {
    var cnt = _service.GetCount(); 
    var sum = _service.GetSum(); 
    _logger.Debug(
        "Factorial: {0}, Avg: {1}, SomethingElse: {2}", 
        _calculator.Factorial(cnt), 
        Sum/cnt); 
} 

As you can see there are few problems with using logger:

  1. Counting factorial is a heavy operation and should be done only if debugging is enabled.
  2. Avg calculation will fail if cnt is 0, even if debugging is not enabled.
  3. Number of expected parameters in format string does not match number of parameters passed to Debug() method and may throw an exception, even if debugging is not enabled (Whether an exception is thrown depends on logger implementation.)

To solve problem #1, most of logging framework provide helper methods similar to this one:

_logger.IsDebugEnabled() 

You can use it to check logger configuration before calling Debug() method. And you have to do it before each call to Debug() -- not very elegant. To solve the second problem, you have to add validations. The third problem is unsolvable without creating a custom method for each variation of parameters -- unrealistic for most cases.

Catching exception during calls to Debug() will give a workaround for problems #2 and #3. It will make logging fail-safe, but it will not provide clear validation for problem #2 or compile-time check for problem #3.

The code would look something like this:
void MyMethod() { 
    var cnt = _service.GetCount(); 
    var sum = _service.GetSum(); 
    if (_logger.IsDebugEnabled()) 
        try { 
            _logger.Debug(
                "Factorial: {0}, Avg: {1}, SomethingElse: {2}", 
                _calculator.Factorial(cnt), 
                Sum/cnt); 
        } 
        catch { 
            //swallow or do something meaningful with 
        } 
} 

The code above has to be done everywhere logging is used -- too much clutter. Let's try to move all that repeating code to the logging façade:

class LoggerFacade { 
    void Debug(Func<string> func) { 
        if (_realLogger.IsDebugEnabled()) 
            try { 
                _realLogger.Debug(func()); 
            } catch { 
                //swallow or do something meaningful 
            } 
    } 
} 

Add helper formatter (optional):

public static class Extensions { 
    public static string FormatWith(this string format, param object[] args) { 
        return string.Format(format, args); 
    } 
} 

...and we can use it more elegantly:

void MyMethod() { 
    var cnt = _service.GetCount(); 
    var sum = _service.GetSum(); 
    _logger.Debug(        
        () =>  "Factorial: {0}, Avg: {1}, SomethingElse: {2}"               
               .FormatWith( 
               _calculator.Factorial(cnt), 
               Sum/cnt)); 
} 

Because we pass lambda to the logging façade, heavy calculation and possible division by zero will happen only when lambda is executed (and this, in turn, will happen only if debugging is enabled). We solved all three problems stated at the beginning of the post and made logging completely fail-safe with a minor change to calling code.

Wednesday, April 25, 2012

Design Checklist

It is common to start design and implementation work by concentrating on the functional requirements and leave the non-functional requirements for later. While this strategy works for Proof Of Concept (POC) projects; it is risky for anything else.

Below is a list of non-functional requirements that all applications have to address sooner or later. My experience shows that having these requirements defined from the beginning of the project pays off in the long run, even if the new system will not satisfy all these requirements from day one.

Security

Does the business impose special guidelines on design or algorithms? A typical example is the additional physical tiers in financial and data encryption in medical industries. Organizations may have internal policies affecting design, development, and/or deployment of the applications.

Testability

How much of the system can be tested? Affects how the team will need to approach automated testing, code coverage/quality metrics, and design principles (e.g., Dependency Injection (DI)). Most people will agree that automated tests are a must. Designing loosely coupled, easily unit testable components with a DI framework gluing them together, however, is a challenging task that a team may need to schedule time for.

Performance

How fast does the system need to process data? Pretty easy to define. Much harder to allocate resources (people, time, infrastructure) and keep consistent measurements (same environment, same data, same processes) early in the project. Unfortunately, very often performance optimization efforts start with end user complaints and leave the "When did it all start?" question unanswered. Getting a baseline of existing performance and comparing performance at planned points throughout the project is often overlooked.

Scalability

How can the system be built to handle increased load with increased resources? This requirement is related to performance measurements, but implementation is more challenging (even on Azure with API to manage infrastructure.)

Availability (uptime)

How long can the system can be down? One aspect of this requirement is how the system behaves if it loses one of its components (e.g., connection to third party service). Another is how the system will handle maintenance (e.g. upgrades). Normal concerns about shared resource usage/dependability and redundant infrastructure are also in this area.

Recoverability

How fast can the system be recovered? The most commonly assumed aspect is how to recover from failure (e.g., failed DB server recovered from backup). While it is valid, there is at least one more aspect that needs consideration—how to recover from an unsuccessful migration (both broken migration and broken functionality). As you can imagine, upgrading part of the web tier and allowing it to work with an existing database requires certain development approaches and good planning. This requirement will affect the design of the deployment procedures too.

Deployment Flexibility

One of the requirements I've seen a couple times (and that is why I put in a category by itself) is the ability to deploy parts of the system (e.g., web applications) independently of other system parts. Although it sounds easy, it affects design and should be treated seriously from the beginning.


These requirements should help determine values and guide design and coding efforts. Here are some sample questions that suggest some of non-functional requirement are not clear or have not been taken into account:

  • Do we need to write to trace log files?
  • Should I cache user profile data?
  • How can I set up Dependency Injection in my unit test?


Monday, April 16, 2012

Merging to Grandparent in TFS2010

I am not a Team Foundation Server (TFS) expert, so I was unpleasantly surprised when I discovered an apparent limitation within it the other day. From what I can see, the TFS merging dialog allows merging changes back to the closest ancestor (parent branch) only. Why is this a problem? Consider the following scenario:
  • Main development happens in the master branch (let's call it "master") 
  • One feature is developed in a "feature" branch which was started from the master some time ago 
  • A small feature bug fix was done in yet another branch "featureSP" which was started from the "feature" branch 
When you try to merge the "featureSP" branch, the dialog will show only one choice for the target: "feature" branch, the parent. This limitation to 3-way merges looks strange.

For a 3-way merge to happen, three check-ins should be sufficient: base, source, and target (source and target should be traceable to base check-in; http://en.wikipedia.org/wiki/Merge_(revision_control)). Source check-in (in our scenario, source it’s "featureSP"—the latest check-in) is traceable to base though both branching operations, so all information is available. I guess this use case has never been considered valuable. I suspect merging changes between sibling branches in TFS may be prone to the same problem, but I haven't tried it yet.

Daniel Sniderman suggested a workaround to this problem by using a baseless merge (from command line with /baseless parameter; see MSDN), thus forcing TFS to use a 2-way merge instead of a 3-way. However, the 2-way merge usually produces a worse result by leaving more unresolvable conflicts. A baseless merge of "featureSP" into "master" would establish a relationship between branches, making future merges easier (from the dialog). I suspect that once the new relationship is established, the old one is lost and merging from "feature" to "master" will bring the same problem again, but I haven’t tried this approach yet.

Other TFS limitations similar to the one described above make branching in TFS a strategic operation that requires planning and coordination. In contrast, Distributed Version Control System (DVCS) keeps branching tactical (even local) and much easier to use; the branch is just a pointer to the commit in the version graph (at least in case of Git). Branches can be created and removed while keeping the version graph the same. Merging in DVCS uses the version graph and does not limit the number of branch pointers between source and base commits.

For a nice comparison of merge algorithms in systems with and without merge metadata (Git and SVN), see Stackoverflow. TFS keeps information about branches and merges (check operation types in Change Details window), so it should merge efficiently.