Andrey Nikiforov

Monday, October 8, 2012

Azure POC for CodeCamp SF 2012

For the CodeCamp SF 2012 talk I wrote small Proof of Concept Azure app to compare performance of SQL Azure vs Table Storage: http://github.com/AndreyNikiforov/AzureArchitecturePOC

Should be a good sample for anyone playing with Azure Services (Queue, Table, and SQL Azure) and validating designs.

Monday, April 30, 2012

Fail-Safe Logging

It is very rare nowadays to find a system without some kind of logging. When using built-in Trace or third party libraries, developers assume:

Logging is side-effect free to anything other than logging system itself
Logging is fail-safe

The first assumption is true when using default file-based log writer, but may fail with other implementations (topic for another post). The second assumption is also true for most cases. Let's analyze a few scenarios in which the second assumption fails.

Consider the following pseudo code:

void MyMethod() {
    var cnt = _service.GetCount(); 
    var sum = _service.GetSum(); 
    _logger.Debug(
        "Factorial: {0}, Avg: {1}, SomethingElse: {2}", 
        _calculator.Factorial(cnt), 
        Sum/cnt); 
}

As you can see there are few problems with using logger:

Counting factorial is a heavy operation and should be done only if debugging is enabled.
Avg calculation will fail if cnt is 0, even if debugging is not enabled.
Number of expected parameters in format string does not match number of parameters passed to Debug() method and may throw an exception, even if debugging is not enabled (Whether an exception is thrown depends on logger implementation.)

To solve problem #1, most of logging framework provide helper methods similar to this one:

_logger.IsDebugEnabled()

You can use it to check logger configuration before calling Debug() method. And you have to do it before each call to Debug() -- not very elegant. To solve the second problem, you have to add validations. The third problem is unsolvable without creating a custom method for each variation of parameters -- unrealistic for most cases.

Catching exception during calls to Debug() will give a workaround for problems #2 and #3. It will make logging fail-safe, but it will not provide clear validation for problem #2 or compile-time check for problem #3.

The code would look something like this:

void MyMethod() { 
    var cnt = _service.GetCount(); 
    var sum = _service.GetSum(); 
    if (_logger.IsDebugEnabled()) 
        try { 
            _logger.Debug(
                "Factorial: {0}, Avg: {1}, SomethingElse: {2}", 
                _calculator.Factorial(cnt), 
                Sum/cnt); 
        } 
        catch { 
            //swallow or do something meaningful with 
        } 
}

The code above has to be done everywhere logging is used -- too much clutter. Let's try to move all that repeating code to the logging façade:

class LoggerFacade { 
    void Debug(Func<string> func) { 
        if (_realLogger.IsDebugEnabled()) 
            try { 
                _realLogger.Debug(func()); 
            } catch { 
                //swallow or do something meaningful 
            } 
    } 
}

Add helper formatter (optional):

public static class Extensions { 
    public static string FormatWith(this string format, param object[] args) { 
        return string.Format(format, args); 
    } 
}

...and we can use it more elegantly:

void MyMethod() { 
    var cnt = _service.GetCount(); 
    var sum = _service.GetSum(); 
    _logger.Debug(        
        () =>  "Factorial: {0}, Avg: {1}, SomethingElse: {2}"               
               .FormatWith( 
               _calculator.Factorial(cnt), 
               Sum/cnt)); 
}

Because we pass lambda to the logging façade, heavy calculation and possible division by zero will happen only when lambda is executed (and this, in turn, will happen only if debugging is enabled). We solved all three problems stated at the beginning of the post and made logging completely fail-safe with a minor change to calling code.

Wednesday, April 25, 2012

Design Checklist

It is common to start design and implementation work by concentrating on the functional requirements and leave the non-functional requirements for later. While this strategy works for Proof Of Concept (POC) projects; it is risky for anything else.

Below is a list of non-functional requirements that all applications have to address sooner or later. My experience shows that having these requirements defined from the beginning of the project pays off in the long run, even if the new system will not satisfy all these requirements from day one.

Security

Does the business impose special guidelines on design or algorithms? A typical example is the additional physical tiers in financial and data encryption in medical industries. Organizations may have internal policies affecting design, development, and/or deployment of the applications.

Testability

How much of the system can be tested? Affects how the team will need to approach automated testing, code coverage/quality metrics, and design principles (e.g., Dependency Injection (DI)). Most people will agree that automated tests are a must. Designing loosely coupled, easily unit testable components with a DI framework gluing them together, however, is a challenging task that a team may need to schedule time for.

Performance

How fast does the system need to process data? Pretty easy to define. Much harder to allocate resources (people, time, infrastructure) and keep consistent measurements (same environment, same data, same processes) early in the project. Unfortunately, very often performance optimization efforts start with end user complaints and leave the "When did it all start?" question unanswered. Getting a baseline of existing performance and comparing performance at planned points throughout the project is often overlooked.

Scalability

How can the system be built to handle increased load with increased resources? This requirement is related to performance measurements, but implementation is more challenging (even on Azure with API to manage infrastructure.)

Availability (uptime)

How long can the system can be down? One aspect of this requirement is how the system behaves if it loses one of its components (e.g., connection to third party service). Another is how the system will handle maintenance (e.g. upgrades). Normal concerns about shared resource usage/dependability and redundant infrastructure are also in this area.

Recoverability

How fast can the system be recovered? The most commonly assumed aspect is how to recover from failure (e.g., failed DB server recovered from backup). While it is valid, there is at least one more aspect that needs consideration—how to recover from an unsuccessful migration (both broken migration and broken functionality). As you can imagine, upgrading part of the web tier and allowing it to work with an existing database requires certain development approaches and good planning. This requirement will affect the design of the deployment procedures too.

Deployment Flexibility

One of the requirements I've seen a couple times (and that is why I put in a category by itself) is the ability to deploy parts of the system (e.g., web applications) independently of other system parts. Although it sounds easy, it affects design and should be treated seriously from the beginning.

These requirements should help determine values and guide design and coding efforts. Here are some sample questions that suggest some of non-functional requirement are not clear or have not been taken into account:

Do we need to write to trace log files?
Should I cache user profile data?
How can I set up Dependency Injection in my unit test?

Monday, April 16, 2012

Merging to Grandparent in TFS2010

I am not a Team Foundation Server (TFS) expert, so I was unpleasantly surprised when I discovered an apparent limitation within it the other day. From what I can see, the TFS merging dialog allows merging changes back to the closest ancestor (parent branch) only. Why is this a problem? Consider the following scenario:

Main development happens in the master branch (let's call it "master")
One feature is developed in a "feature" branch which was started from the master some time ago
A small feature bug fix was done in yet another branch "featureSP" which was started from the "feature" branch

When you try to merge the "featureSP" branch, the dialog will show only one choice for the target: "feature" branch, the parent. This limitation to 3-way merges looks strange.

For a 3-way merge to happen, three check-ins should be sufficient: base, source, and target (source and target should be traceable to base check-in; http://en.wikipedia.org/wiki/Merge_(revision_control)). Source check-in (in our scenario, source it’s "featureSP"—the latest check-in) is traceable to base though both branching operations, so all information is available. I guess this use case has never been considered valuable. I suspect merging changes between sibling branches in TFS may be prone to the same problem, but I haven't tried it yet.

Daniel Sniderman suggested a workaround to this problem by using a baseless merge (from command line with /baseless parameter; see MSDN), thus forcing TFS to use a 2-way merge instead of a 3-way. However, the 2-way merge usually produces a worse result by leaving more unresolvable conflicts. A baseless merge of "featureSP" into "master" would establish a relationship between branches, making future merges easier (from the dialog). I suspect that once the new relationship is established, the old one is lost and merging from "feature" to "master" will bring the same problem again, but I haven’t tried this approach yet.

Other TFS limitations similar to the one described above make branching in TFS a strategic operation that requires planning and coordination. In contrast, Distributed Version Control System (DVCS) keeps branching tactical (even local) and much easier to use; the branch is just a pointer to the commit in the version graph (at least in case of Git). Branches can be created and removed while keeping the version graph the same. Merging in DVCS uses the version graph and does not limit the number of branch pointers between source and base commits.

For a nice comparison of merge algorithms in systems with and without merge metadata (Git and SVN), see Stackoverflow. TFS keeps information about branches and merges (check operation types in Change Details window), so it should merge efficiently.

Thursday, March 22, 2012

TFS2010 Gated Check-In and Multiple Build Agents

Most dev teams today improve their process by establishing the practice of Continuous Integration (CI; http://martinfowler.com/articles/continuousIntegration.html) and automating builds and tests. There are many products available to help perform automated builds. The range of products provide a fit for any size wallet. Microsoft Team Foundation Server 2010 (TFS2010) comes with these capabilities built-in. TFS2010 goes beyond check-in-build-test-report functionality and offers gated check-ins: build-test-report-check-in. This mode prevents broken builds appearing in the Source Control Management (SCM) system.

Gated check-ins are implemented on top of shelvesets. Once turned on for build definition, gated check-in will cause all check-ins to submit changes as shelvesets. The build server processes the shelveset and, if the build and tests are successful, the build server checks the changes into SCM and makes them available to other team members. Gated check-ins are subject to the same limitations as shelvesets though (http://blogs.msdn.com/b/buckh/archive/2006/01/10/511188.aspx ).

As the project grows in terms of team and/or source code size, scalability of the TFS builds may become the next challenge. One of the easiest steps to address this problem is to add more TFS build agents. Since each build agent can work on one build at a time, having multiple build agents should allow TFS to process multiple check-ins concurrently, right?

For regular CI, adding build agents improves the overall development process. Each gated check-in requires "exclusive" write access to the source code during build and test, so all build agents, except one, will be idle.

Most developers face resource sharing issues everyday (database locks and scaling-out challenges are the most popular). Performance improvement efforts usually go either into scaling up or into eventually consistent world. Gated check-ins and TFS scalability are the same challenges but in a new context. The solutions are the same as well: scale up the TFS server or move into eventually consistent old-and-proven CI (check-in-build-test-report).

It is easy to test the behavior mentioned above by using VMs with Visual Studio/TFS (http://www.microsoft.com/download/en/details.aspx?id=240). Add another agent, download a large open source project (I used Orchard CMS), set up build with Gated Check-In, and try submitting two changes one after another. Only one agent will be working.

Friday, February 10, 2012

The Beginning of Time

New Day. Starting Blog.