Latest Replies
Friday
Jan202012

New Videos on CQRS/DDD and Practical CQRS

Thanks to awesome guys from Austrian Alt.NET (and particularly Jörg Egretzberger) there has been an opportunity for me to speak with Greg Young at Professional.NET in Vienna on September 2011. And here are the videos that were made there.

It's recommended to watch them in the provided order: first Greg's "CQRS and DDD" and then my "Practical CQRS"

Please, keep in mind, that this event took place more than 4 months from now; it also was the first talk I've done together with Greg (meaning that I became slightly less stupid and naive since then, especially during the CQRS RoadTrip 2011; at least that's what I hope).

CQRS and Domain Driven Design

by Greg Young. Direct link: http://youtu.be/KXqrBySgX-s

Practical CQRS in Cloud

by Rinat Abdullin. Direct Link: http://youtu.be/CnvO_nlvrps | PDF

I hope you enjoy them. Once again, kudos to terrific guys from Austrian Alt.NET community for making these recordings possible.

Saturday
Jan142012

Run Lokad.CQRS code natively on Windows Azure

As it seems, the most common use of Lokad.CQRS Sample these days is for building CQRS-based applications for Intranet (and migrating legacy projects). I didn't see that coming, but glad that the latest "lean" version makes some sense to people (more about it).

However, one of the initial purposes of this Sample is to enable and demonstrate cross-cloud portability of projects developed using abstractions from Lokad.Cqrs.Portable. That's how we use it at Lokad.

So to highlight this (and to help folks from Microsoft CQRS Advisory Board that showed interest in trying Lokad.CQRS with Azure), I've added two new projects: Sample.Azure.Wires and Sample.Azure.Deploy. There is nothing really special, first project just wires SampleProject to run on Windows Azure in a Worker Role, without touching existing domain code. Second project is the actual deployment.

As expected, trace output is similar to the one we get from running sample in on-premises mode (via Sample.Engine console).

Obviously, this worker is wired to use native components of Windows Azure, instead of file system. These components (developed in Lokad and production-tested over the last few years in various scenarios) include:

  • BlockBlobTapeStorage (used for append-only event streams on Azure)
  • AzureAtomicStorage (used for storing documents, saga state and persistent read models)
  • BlobStreamingStorage (used for streaming large BLOBs for Big DataProcessing)
  • Azure Queues (for your message sending needs)

If you have latest Azure SDK installed, just open Lokad.CQRS.Sample.withAzure.sln to see these additional Azure projects and get a taste of event sourcing on the cloud (actually tastes exactly like any other ES, but feels cooler).

Maintenance Tip: By the way, you can download domain log file (sample-tapes/domain.tmd) from the Azure Storage to your disk and open it with Audit tool. This helps to visualize and investigate processes that happen in your CQRS system on the cloud (actually saved my day more than once, when production systems hit unexpected problems). Likewise you can also download and investigate event streams of any single aggregate.

Actually, this Audit tool was also designed to open event streams directly from Azure (and allow sending selected messages back or rebuilding all projections), however this functionality is not ported to open source yet. If somebody finds it useful, I'll try to find time and export it (there is nothing really fancy there, though - just using append-only nature of event streams for efficient data replication from remote cloud to local machine).

Maintenance Tip: Those, that plan to use this in a production, might want to plug the quarantine into the message handlers. For instance, here is a sample that records every handler failure and send a detailed report (with stack traces and message information) to email as soon as message is quarantined. The latter helps to deal with cloud problems really fast.

The next thing I want to include in this Sample Project is a lean Web UI, using ASP.NET MVC 3 Razor Views with jQuery and Bootstrap.css. Just to demonstrate how easy it is to add browser-friendly and fast Web UI to existing CQRS systems (and have it hosted on-premises or in the cloud, of course). What do you think?

Meanwhile, have fun, join the community of people abusing the project and be sure to fill in CQRS Survey put out by P&P Team

Thursday
Jan052012

Announcing Lokad.CQRS vLean - SampleProject and Tools

I've just finished assembling pre-release of Lokad.CQRS vLean (lean branch in github repository). From of this version Lokad.CQRS is no longer a framework, it is a full sample.

As such, Lokad.CQRS could be used for learning about event centric development (CQRS/DDD with event sourcing and ability to deploy in cloud and on-premises). Gradually growing series of articles in my bliki could provide some theoretical background.

You can also use Lokad.CQRS to jumpstart new projects by copying the entire SampleProject to your system and then modifying it to your needs. That's essentially how we start new projects at Lokad these days.

Because of this transition from "Framework" to "Sample with Sources" (aka "Reference Implementation"), I was finally able to include a lot of new material into this branch.

Sample Project

I've ripped part of domain code (simple account+users setup) from one of our projects, along with all the accompanying tooling, tests and engine. Current approach to event sourcing is also included (as explained in the bliki)

The project currently does not include:

  • Deployment runners for Azure.Cloud (planned later, however Lokad.CQRS users should have pretty good sense of reconfiguring the domain to run in Azure).
  • Web UI (but I might add it later, if time permits, since Razor + Bootstrap make really pleasant and fast development with this architecture).

I'll try to add these later (community contributions are also awesome).

Absence of Web UI makes it harder to visualize what's going inside. However you can still run tests and also start Sample.Engine (configured with file-based persistence and queues), that will fire a few commands, which will produce events that will be wired to sagas and projections. Just like in the real world.

Plus, this project includes some sample code and tools that are wired to it. These are used exactly how we currently use them (although not necessary how we will use them a few months later). Read below for more info.

Contracts DSL

Lokad Contracts DSL is an optional console utility that you can run in the background. It tracks changes to files with special compact syntax (as mentioned in commands and events on bliki) and updates CS file. Changes are immediate upon saving file (and ReSharper immediately picks them). This is an improved version of Lokad Code DSL, it supports identities and can auto-generate interfaces for aggregates and aggregate state classes.

You can try this out by starting SampleProject\Tools\Dsl project and then doing some changes to SampleProject\Domain\Sample.Contracts\Messages.tt.

Current DSL code generates contracts classes that are compatible with DataContracts, ServiceStack.JSON and ProtoBuf. Here are how they look like:

[DataContract(Namespace = "Sample")]
public partial class AddSecurityPassword : ICommand<SecurityId>
{
    [DataMember(Order = 1)] public SecurityId Id { get; private set; }
    [DataMember(Order = 2)] public string DisplayName { get; private set; }
    [DataMember(Order = 3)] public string Login { get; private set; }
    [DataMember(Order = 4)] public string Password { get; private set; }

    AddSecurityPassword () {}
    public AddSecurityPassword (SecurityId id, string displayName, string login, string password)
    {
        Id = id;
        DisplayName = displayName;
        Login = login;
        Password = password;
    }
}

Audit Tool

After you have run Sample.Engine, try starting Audit and opening with it this file temp\sample-tapes\domain.tmd (located in folder where the engine run). You should see something like this:

You'll probably figure out the purpose of color coding (or you could look out in the sources).

BTW, here's how the other tab looks like - it is used for automatically detecting and wiring all projections and then running the selected event stream through them:

Please keep in mind, that this is a fast rip from internal code, so a few buttons and functionality might not make full sense in the file-based context (i.e. domain log synchronization).

Simple Testing

I've been mentioning heavy abuse of Greg's SimpleTesting for dealing with specifications to test Aggregates. Source code for that is included (and for printing out them in readable format). Just fire your NUnit (either directly or via ReSharper):

This setup currently supports usual AR+ES testing, plus testing exceptions thrown by the domain code (domain errors), plus dealing with services and time.

Core Changes

There were a few core changes in the actual Lokad.CQRS framework. They were caused by the need to move forward and support wider range of scenarios (and better scalability). Actually, these features were mostly related to throwing things out - all builder syntax and containers. In the current version you wire everything without using overcomplicated builders. This leads to fewer dependencies and simpler code.

If you don't know, how old builder code translates to the new one - just check Wires and Engine projects in SampleProject (in addition to Snippets and unit tests for core building blocks).

In the spirit, we are gradually drifting away from style of architectures that is similar to NServiceBus, MassTransit and RhinoQueues (where you have handler classes plugged into the container for you). Instead, we are going directly for the type of architectures described by the outstanding ZMQ guide.

What's Next

As always, your feedback is always welcome (it helped to shape the current release and let us move forward). Please share it (along with any questions you might have) in Lokad community

A lot more is planned in this world along the following directions:

  • Simplifying the core even further, along with solidifying the backing theory, based on real-world experience.
  • Better support of multiple clouds (just Azure is not enough)
  • Realt-time performance scenarios.

Stay tuned and participate in the community, if you are interested!

Friday
Dec302011

Using Git Revision in Windows Azure Cloud Deployments

Here's a quick approach for including git versions of your codebase into Windows Azure Deployment labels (as visible in the Portal). This applies to Azure SDK 1.6 and probably higher. Versions are appended automatically whenever you trigger a publish from Visual Studio.

git versions usually are reported by git describe and could look like r40 (if there is r40 tag on the last commit) or r39-31-g48ac07c (if last tag r39 was 31 commits ago).

It works like this.

Step 1: Create new Azure Deployment profile for your deployment project. It will be saved somewhere to Profiles\ProjectName.azurePubxml.

Step 2: Save project and then edit deployment project file to include this profile file only if it exists

<ItemGroup>
  <PublishProfile Include="Profiles\ProjectName.azurePubxml"  
     Condition=" Exists('Profiles\ProjectName.azurePubxml') " />
</ItemGroup>

Step 3: Drop a .gitignore file into the Profiles folder so that changes will not be tracked. Contents will be straightforward:

ProjectName.azurePubxml

Step 4: Add PreBuildEvent to the Azure deployment project by editing it further. This script should:

  • Fill <AzureDeploymentLabel> XML node in azurePubxml with a content of your choice (optionally grabbing output of git describe).
  • Fail gracefully if there is no git available.
  • Fail gracefully if there is no azurePubxml file.

In other words, we just take output of git describe command and put it inside the match of:

new Regex("<AzureDeploymentLabel>.*</AzureDeploymentLabel>", RegexOptions.Multiline);

MSBuild PreBuildEvent script can be wired to our process via editing of .ccproj file (or you can just fill that via Visual Studio UI - Deployment | Properties | Build Events):

<PropertyGroup>
  <PreBuildEvent>"PathToReplacementScript" "$(ProjectDir)\Profiles\ProjectName.azurePubxml"</PreBuildEvent>
</PropertyGroup>

For the actual replacement script you can use scripting environment of your choice or even C#.

By the way, there are two interesting twists upon this concept.

Twist 1: You can leverage git's super powers to print all changes in the code between any two deployment labels. They will be nicely formatted by committer, too.

git log r39..r40 --pretty=short | git shortlog

That's how we publish releases at Lokad.

Twist 2: You can set your system to publish InstanceStartedEvent with the following schema:

InstanceStarted!(string codeVersion, string role, string instance)

The latter can be used to link your code version trees directly with the event streams and life-cycle of a distributed system deployed to the cloud. This immensely simplifies post-mortem debugging of such systems along with adding a few maintenance tricks to your sleeve.

Tuesday
Dec272011

Example of Self-documenting Unit Test with Event Sourcing

One of the biggest advantages of event sourcing approach is it's inherent capability to turn unit tests into a living documentation. Below is an example of specification that I've worked my way through today (that's how NUnit prints it out).

registrations: duplicate email fails - Passed

Environment: 
  index includes email("contact@lokad.com")

When: Register 'd6e64e':
        Customer Name: Lokad
        Contact Email: contact@lokad.com
        Date:          2011-12-27

Expectations:
  [Passed] Registration 'd6e64e':
             Customer Name: Lokad
             Contact Email: contact@lokad.com
             Date:          2011-12-27
  [Passed] Registration 'd6e64e' failed:
             Email 'contact@lokad.com' is already taken.

And here's how actual NUnit code looks like:

public Specification duplicate_email_fails()
{
    var info = new RegistrationInfoBuilder("contact@lokad.com", "Lokad").Build();
    var index = new MockUniquenessService();

    return new RegistrationSpec(index)
        {
            Before = {() => index.includes_email("contact@lokad.com")},
            When = new CreateRegistration(reg, info),
            Expect =
                {
                    new RegistrationCreated(reg, info),
                    new RegistrationFailed(reg, new[]
                        {
                            "Email 'contact@lokad.com' is already taken."
                        })
                },
            Finally = index.Clear
        };
}

All was achieved without any special magic or even fancy tools. I've just pulled over sources of SimpleTesting and CompareObjects for additional readability.

For those who are interested in RegistrationSpec class, it is just a simple snippet wiring together dependencies of aggregate root to a strongly-typed specification deriving from TypedSpecification in SimpleTesting:

public sealed class RegistrationSpec : AggregateSpecification<RegistrationId>
{
    public RegistrationSpec(IRegistrationUniquenessService service)
    {
        Factory = (events, observer) =>
            {
                var state = new RegistrationAggregateState(events);
                return new RegistrationAggregate(state, observer, service, 
                    new TestPassword(), 
                    new TestIdentity());
            };
    }
}

Explicit strong-typing of aggregates (as described in bliki) works all the way back in unit test specification by allowing to benefit from compiler-time checking and IntelliSense support. In other words: you don't need to navigate through hundreds of messages to figure out which ones are actually applicable in test.

Saturday
Dec242011

Migrating Legacy Systems to Event Sourcing

These days I'm working on migrating really legacy system towards the simplified CQRS/DDD design with event sourcing for the cloud.

As part of the migration process, I'm reverse engineering legacy SQL database into a stream of events. These events are not precise representation of what has happened in the past (this exact information is irreversibly lost, as in almost any data-driven system), but rather a pretty good estimate that could be used to prepopulate the new version.

Essentially, reverse engineering events is about writing a throw-away utility that will scan database tables (MS Access files or punch-cards) and spit out events that could be used to reproduce that state.

For instance, consider this customer record in DB table:

Customer {
  Name : "GoDaddy",
  Id : SomeGuid,
  Created : 2008-13-12,
  Status : Deleted,
  Phone : "111-22-22",
  Reason : "Supporting SOPA was poor PR move"
}

This record could be reversed into the following events

CustomerCreated!(
  Id: SomeGuid, 
  Name: "GoDaddy", 
  Created: 2008-13-12
)
CustomerPhoneSpecifid!(
  Id: SomeGuid, 
  Phone: "111-22-22"
)
CustomerDeleted!(
  Id: SomeGuid, 
  Reason: "Supporting SOPA was poor PR move", 
  Deleted: 2011-12-24
)

Note, that we actually had to improvise while coming up with this event stream: date of deletion was not stored in the original database (we were losing this information). So we are just substituting some predefined date here (i.e. date of upgrade to CQRS/DDD+ES).

When you have a system with a few years of history, quite a few events are generated. The system that I'm currently migrating has data that dates back to the early dates of Lokad, hence 300-400 thousand events is something expected.

As part of development process, these events are run through the aggregate state objects and also through the projections. The goal here is to pass all possible sanity checks and get read models that match exactly to the UI currently visible in the old system. If new system looks and behaves exactly like the old one (even if the guts are completely simplified), then we are moving in the right direction.

Obviously, during this process, a lot of problems show up, especially with logically inconsistent or corrupt data (i.e. accounting inconsistencies caused by race conditions and dead locks in the legacy database). These things are generally to be resolved manually - there is no magical silver bullet.

Sunday
Dec182011

Tech Layer of CQRS Systems: Pushing it Further

Let's see how we can extend previously mentioned model of simple event store to support partitioning of event stores, along with re-partitioning and fail-over scenarios.

We bring in the Service-Oriented Reliable queuing (Majordomo Pattern with heartbeat support) as the middle-man between clients (aggregate roots) and actual event stores. This router will know about available event store partitions (even if there is only one) and their fail-over replicas.

Actual stores could be either straight append-only files that are preallocated in large chunks (with one writer and multiple readers). Alternatively we could use circular buffer (resembling LMAX approach) for short-term memory, that is eventually replicated to something more durable.

Note, that here we assume that we could host multiple readers on the same event stream:

  • Publishers (to replicate events downstream).
  • Replayers (to provide random access to persisted events, including full replays).
  • Saga and projections hosts.

Each of these readers has benefit of being able to process pending events in batches, simplifying "we need to catch up on a lot of work" scenarios. Once again, just like it is in LMAX.

For the purity approach we can just keep publisher and replayer running on that even stream, while pushing the rest of the readers further down-stream.

In order for some things to happen properly, event store must add it's own envelope (or a frame) to recorded messages, including sequence number, that is incrementing and unique within the store. Fortunately it is easy to do, since we have only one writing thread (the one, that supports more than 20000 messages per second on commodity hardware).

Why would we need sequence number? Imagine a case, where a few projection hosts are subscribed to event stream in real time. Let's also assume that we have so many projection hosts over the network, that UDP-like broadcasts become beneficial for us. There is a slight problem though - we can't guarantee reliable delivery of messages out-of-box with UDP. This is where sequence numbers come in - they can be used to detect missing, duplicate or out-of-order messages. When this happens, we will ask for replay or just throw the exception to support.

How do we handle fail-over of event stores? High-availability pair could be used here (Binary star). We will start another store replica in a different availability zone, connected via the dedicated network connection to the master. We'll configure store clients (our majordomo broker in this case) to fall back to the slave store if the master is not available (this can be detected via the heart-beats). Slave will take on as master, if it starts getting requests AND master is not available. Tricky part here is to avoid network configuration that would increase chance of "split brain".

How do we handle failures of central broker? We don't need to. We can have multiple instances running on the network and have clients configured to pick the one available.

How do we handle repartitioning of stores? That's an interesting one. Articles are written about repartitioning (i.e. when applied to search indexes). Brute-force approach is to create a replica on a new machine and let it catch up with the primary store. Then, shut down writing thread of the master and switch broker(s) to the slave. We'll configure stream readers to start working on new partition as soon as readers fully catch up on the former masters.

Interesting possibility is that similar setup could also be used to create persistent queues for "command" messages. The same message store could also be used to implement Disconnected Reliability (Titanic Pattern) on top of Service-Oriented reliable queuing provided by Majordomo.

How much sense does this make?

NB: Naturally, there are a few notes:

  • Presented setup does not need to fit 100% of event centric systems. It's just something that an architecture could be based upon. Specific sections and subsystems could be tuned from there (especially as we go into custom setups and various outliers).
  • Setup provides enough decoupling and flexibility to work in both intranet scenarios (or even in a single machine) and in the flexible cloud scenarios. This does not affect the architecture, just a few specific choices around topology and infrastructure.
  • ZMQ guide provides an extensive overview of reliability and scalability patterns mentioned above. If something does not make sense, please try to read the guide first.

I don't plan to bring these approaches into production any time soon. At the moment of writing this is just a theoretical exercise that should serve as a backup plan. This is just in case I would need to increase throughput of our message queues by a factor of 1000 or more per partition. Given specifics of our work at Lokad, this possibility should be handled upfront - hence this research.