Latest Replies
Sunday
Jan292012

Handling Big Data in Cloud 

Big Data is one of the new hype terms that is gradually gaining popularity. No big surprise - amount of data around us is gradually growing and by properly mining it we can get competitive advantage and eventually make more money. Money attracts money.

Usually by term big data we mean a collection of datasets that are so large, that they become too hard to be processed on a single machine. This data can even be so big, that it takes hours or days to process it in a data warehouse.

Fortunately, cloud computing comes to the rescue. It provides following resources:

  • nearly unlimited scalable storage capacity for data (i.e. Azure Blob storage or Amazon S3)
  • elastic compute capacity to process this data (i.e. Azure Worker Roles or Amazon EC)
  • network capacities and services to transfer data and coordinate processing (queues and actual networks).

These resources are elastic (you can get as much as you need) and paid-on-demand. Latter is really important, since you pay only for what you use.

I'm assuming here, that in your business model, more data processed means more money made.

There are three distinct approaches in handling big data, based on the specific challenges.

Batch Processing

First one is about batch processing, where you do not need extremely fast computation results but have terabytes and petabytes of data. This is essentially about implementing MapReduce functionality in your system, often resorting to hacky but extremely practical tricks. Below I'll talk about some of the lessons learned in this direction in Lokad (we provide forecasts as a service without any hard limit on amount of data to process).

Obviously, storage of this data becomes a primary concern with this approach. Fortunately starting companies can leverage already existing cloud computing resources, starting from Google's GFS and up to Azure Store. Or if you are big enough (and this is worth it), you can roll a data center of your own. Idem for computing and network capacities.

Stream Processing

Second approach deals with cases, where you need high-throughput and consistently low latency on gigabytes of data. High-frequency trading is one of the most known domains here (various real-world telemetry systems being the second one). Abusing event streams and ring buffers (like in LMAX) along with partitioning - helps to deal with the challenge.

Due to latency requirements, cloud computing might be not the best choice for latency-sensitive solutions. It is more efficient to roll out fine-tuned infrastructure of your own including specialized hardware and soft (things like InfiniBand and ASIC or even Mixed-signal chips).

However if you are OK with a bit of latency and are more interested in high-throughput continuous computation, then the flexible network and CPU resources provided by cloud are a good match.

Realtime Batch Processing

Third approach to Big Data involves dealing with more complex requirement - providing real-time capabilities over vast amounts of data (so we have to be both fast and capable of handling petabytes of data). Twitter is a vivid example here - it needs to provide (almost) real-time analysis over billions of tweets around the world (although fortunately it does not to go at microsecond level).

This challenge can be solved by actually mixing first two approaches: we would use slower batch-processing for dealing with actual bulk of data (tricks like preprocessing, append-only storage and MapReduce work here), while latest data will be handled through the real-time streams (stream processing and continuous computation). Results are merged as they come out of these two data pipes. Over the course of time (i.e. daily) latest data is pushed from "hot" zone into the bulk data.

More often than not, real-time data processing is done by simplified approximating algorithms that deal with subset of data. Batch processing is more through and precise (at the cost of being slower). By pushing real-time data back to the bulk data, results of the computations are actually corrected, as they are recomputed by batch algorithms. Twitter's Storm project features nice overview of this approach.

Friday
Jan202012

New Videos on CQRS/DDD and Practical CQRS

Thanks to awesome guys from Austrian Alt.NET (and particularly Jörg Egretzberger) there has been an opportunity for me to speak with Greg Young at Professional.NET in Vienna on September 2011. And here are the videos that were made there.

It's recommended to watch them in the provided order: first Greg's "CQRS and DDD" and then my "Practical CQRS"

Please, keep in mind, that this event took place more than 4 months from now; it also was the first talk I've done together with Greg (meaning that I became slightly less stupid and naive since then, especially during the CQRS RoadTrip 2011; at least that's what I hope).

CQRS and Domain Driven Design

by Greg Young. Direct link: http://youtu.be/KXqrBySgX-s

Practical CQRS in Cloud

by Rinat Abdullin. Direct Link: http://youtu.be/CnvO_nlvrps | PDF

I hope you enjoy them. Once again, kudos to terrific guys from Austrian Alt.NET community for making these recordings possible.

Saturday
Jan142012

Run Lokad.CQRS code natively on Windows Azure

As it seems, the most common use of Lokad.CQRS Sample these days is for building CQRS-based applications for Intranet (and migrating legacy projects). I didn't see that coming, but glad that the latest "lean" version makes some sense to people (more about it).

However, one of the initial purposes of this Sample is to enable and demonstrate cross-cloud portability of projects developed using abstractions from Lokad.Cqrs.Portable. That's how we use it at Lokad.

So to highlight this (and to help folks from Microsoft CQRS Advisory Board that showed interest in trying Lokad.CQRS with Azure), I've added two new projects: Sample.Azure.Wires and Sample.Azure.Deploy. There is nothing really special, first project just wires SampleProject to run on Windows Azure in a Worker Role, without touching existing domain code. Second project is the actual deployment.

As expected, trace output is similar to the one we get from running sample in on-premises mode (via Sample.Engine console).

Obviously, this worker is wired to use native components of Windows Azure, instead of file system. These components (developed in Lokad and production-tested over the last few years in various scenarios) include:

  • BlockBlobTapeStorage (used for append-only event streams on Azure)
  • AzureAtomicStorage (used for storing documents, saga state and persistent read models)
  • BlobStreamingStorage (used for streaming large BLOBs for Big DataProcessing)
  • Azure Queues (for your message sending needs)

If you have latest Azure SDK installed, just open Lokad.CQRS.Sample.withAzure.sln to see these additional Azure projects and get a taste of event sourcing on the cloud (actually tastes exactly like any other ES, but feels cooler).

Maintenance Tip: By the way, you can download domain log file (sample-tapes/domain.tmd) from the Azure Storage to your disk and open it with Audit tool. This helps to visualize and investigate processes that happen in your CQRS system on the cloud (actually saved my day more than once, when production systems hit unexpected problems). Likewise you can also download and investigate event streams of any single aggregate.

Actually, this Audit tool was also designed to open event streams directly from Azure (and allow sending selected messages back or rebuilding all projections), however this functionality is not ported to open source yet. If somebody finds it useful, I'll try to find time and export it (there is nothing really fancy there, though - just using append-only nature of event streams for efficient data replication from remote cloud to local machine).

Maintenance Tip: Those, that plan to use this in a production, might want to plug the quarantine into the message handlers. For instance, here is a sample that records every handler failure and send a detailed report (with stack traces and message information) to email as soon as message is quarantined. The latter helps to deal with cloud problems really fast.

The next thing I want to include in this Sample Project is a lean Web UI, using ASP.NET MVC 3 Razor Views with jQuery and Bootstrap.css. Just to demonstrate how easy it is to add browser-friendly and fast Web UI to existing CQRS systems (and have it hosted on-premises or in the cloud, of course). What do you think?

Meanwhile, have fun, join the community of people abusing the project and be sure to fill in CQRS Survey put out by P&P Team

Thursday
Jan052012

Announcing Lokad.CQRS vLean - SampleProject and Tools

I've just finished assembling pre-release of Lokad.CQRS vLean (lean branch in github repository). From of this version Lokad.CQRS is no longer a framework, it is a full sample.

As such, Lokad.CQRS could be used for learning about event centric development (CQRS/DDD with event sourcing and ability to deploy in cloud and on-premises). Gradually growing series of articles in my bliki could provide some theoretical background.

You can also use Lokad.CQRS to jumpstart new projects by copying the entire SampleProject to your system and then modifying it to your needs. That's essentially how we start new projects at Lokad these days.

Because of this transition from "Framework" to "Sample with Sources" (aka "Reference Implementation"), I was finally able to include a lot of new material into this branch.

Sample Project

I've ripped part of domain code (simple account+users setup) from one of our projects, along with all the accompanying tooling, tests and engine. Current approach to event sourcing is also included (as explained in the bliki)

The project currently does not include:

  • Deployment runners for Azure.Cloud (planned later, however Lokad.CQRS users should have pretty good sense of reconfiguring the domain to run in Azure).
  • Web UI (but I might add it later, if time permits, since Razor + Bootstrap make really pleasant and fast development with this architecture).

I'll try to add these later (community contributions are also awesome).

Absence of Web UI makes it harder to visualize what's going inside. However you can still run tests and also start Sample.Engine (configured with file-based persistence and queues), that will fire a few commands, which will produce events that will be wired to sagas and projections. Just like in the real world.

Plus, this project includes some sample code and tools that are wired to it. These are used exactly how we currently use them (although not necessary how we will use them a few months later). Read below for more info.

Contracts DSL

Lokad Contracts DSL is an optional console utility that you can run in the background. It tracks changes to files with special compact syntax (as mentioned in commands and events on bliki) and updates CS file. Changes are immediate upon saving file (and ReSharper immediately picks them). This is an improved version of Lokad Code DSL, it supports identities and can auto-generate interfaces for aggregates and aggregate state classes.

You can try this out by starting SampleProject\Tools\Dsl project and then doing some changes to SampleProject\Domain\Sample.Contracts\Messages.tt.

Current DSL code generates contracts classes that are compatible with DataContracts, ServiceStack.JSON and ProtoBuf. Here are how they look like:

[DataContract(Namespace = "Sample")]
public partial class AddSecurityPassword : ICommand<SecurityId>
{
    [DataMember(Order = 1)] public SecurityId Id { get; private set; }
    [DataMember(Order = 2)] public string DisplayName { get; private set; }
    [DataMember(Order = 3)] public string Login { get; private set; }
    [DataMember(Order = 4)] public string Password { get; private set; }

    AddSecurityPassword () {}
    public AddSecurityPassword (SecurityId id, string displayName, string login, string password)
    {
        Id = id;
        DisplayName = displayName;
        Login = login;
        Password = password;
    }
}

Audit Tool

After you have run Sample.Engine, try starting Audit and opening with it this file temp\sample-tapes\domain.tmd (located in folder where the engine run). You should see something like this:

You'll probably figure out the purpose of color coding (or you could look out in the sources).

BTW, here's how the other tab looks like - it is used for automatically detecting and wiring all projections and then running the selected event stream through them:

Please keep in mind, that this is a fast rip from internal code, so a few buttons and functionality might not make full sense in the file-based context (i.e. domain log synchronization).

Simple Testing

I've been mentioning heavy abuse of Greg's SimpleTesting for dealing with specifications to test Aggregates. Source code for that is included (and for printing out them in readable format). Just fire your NUnit (either directly or via ReSharper):

This setup currently supports usual AR+ES testing, plus testing exceptions thrown by the domain code (domain errors), plus dealing with services and time.

Core Changes

There were a few core changes in the actual Lokad.CQRS framework. They were caused by the need to move forward and support wider range of scenarios (and better scalability). Actually, these features were mostly related to throwing things out - all builder syntax and containers. In the current version you wire everything without using overcomplicated builders. This leads to fewer dependencies and simpler code.

If you don't know, how old builder code translates to the new one - just check Wires and Engine projects in SampleProject (in addition to Snippets and unit tests for core building blocks).

In the spirit, we are gradually drifting away from style of architectures that is similar to NServiceBus, MassTransit and RhinoQueues (where you have handler classes plugged into the container for you). Instead, we are going directly for the type of architectures described by the outstanding ZMQ guide.

What's Next

As always, your feedback is always welcome (it helped to shape the current release and let us move forward). Please share it (along with any questions you might have) in Lokad community

A lot more is planned in this world along the following directions:

  • Simplifying the core even further, along with solidifying the backing theory, based on real-world experience.
  • Better support of multiple clouds (just Azure is not enough)
  • Realt-time performance scenarios.

Stay tuned and participate in the community, if you are interested!

Friday
Dec302011

Using Git Revision in Windows Azure Cloud Deployments

Here's a quick approach for including git versions of your codebase into Windows Azure Deployment labels (as visible in the Portal). This applies to Azure SDK 1.6 and probably higher. Versions are appended automatically whenever you trigger a publish from Visual Studio.

git versions usually are reported by git describe and could look like r40 (if there is r40 tag on the last commit) or r39-31-g48ac07c (if last tag r39 was 31 commits ago).

It works like this.

Step 1: Create new Azure Deployment profile for your deployment project. It will be saved somewhere to Profiles\ProjectName.azurePubxml.

Step 2: Save project and then edit deployment project file to include this profile file only if it exists

<ItemGroup>
  <PublishProfile Include="Profiles\ProjectName.azurePubxml"  
     Condition=" Exists('Profiles\ProjectName.azurePubxml') " />
</ItemGroup>

Step 3: Drop a .gitignore file into the Profiles folder so that changes will not be tracked. Contents will be straightforward:

ProjectName.azurePubxml

Step 4: Add PreBuildEvent to the Azure deployment project by editing it further. This script should:

  • Fill <AzureDeploymentLabel> XML node in azurePubxml with a content of your choice (optionally grabbing output of git describe).
  • Fail gracefully if there is no git available.
  • Fail gracefully if there is no azurePubxml file.

In other words, we just take output of git describe command and put it inside the match of:

new Regex("<AzureDeploymentLabel>.*</AzureDeploymentLabel>", RegexOptions.Multiline);

MSBuild PreBuildEvent script can be wired to our process via editing of .ccproj file (or you can just fill that via Visual Studio UI - Deployment | Properties | Build Events):

<PropertyGroup>
  <PreBuildEvent>"PathToReplacementScript" "$(ProjectDir)\Profiles\ProjectName.azurePubxml"</PreBuildEvent>
</PropertyGroup>

For the actual replacement script you can use scripting environment of your choice or even C#.

By the way, there are two interesting twists upon this concept.

Twist 1: You can leverage git's super powers to print all changes in the code between any two deployment labels. They will be nicely formatted by committer, too.

git log r39..r40 --pretty=short | git shortlog

That's how we publish releases at Lokad.

Twist 2: You can set your system to publish InstanceStartedEvent with the following schema:

InstanceStarted!(string codeVersion, string role, string instance)

The latter can be used to link your code version trees directly with the event streams and life-cycle of a distributed system deployed to the cloud. This immensely simplifies post-mortem debugging of such systems along with adding a few maintenance tricks to your sleeve.

Tuesday
Dec272011

Example of Self-documenting Unit Test with Event Sourcing

One of the biggest advantages of event sourcing approach is it's inherent capability to turn unit tests into a living documentation. Below is an example of specification that I've worked my way through today (that's how NUnit prints it out).

registrations: duplicate email fails - Passed

Environment: 
  index includes email("contact@lokad.com")

When: Register 'd6e64e':
        Customer Name: Lokad
        Contact Email: contact@lokad.com
        Date:          2011-12-27

Expectations:
  [Passed] Registration 'd6e64e':
             Customer Name: Lokad
             Contact Email: contact@lokad.com
             Date:          2011-12-27
  [Passed] Registration 'd6e64e' failed:
             Email 'contact@lokad.com' is already taken.

And here's how actual NUnit code looks like:

public Specification duplicate_email_fails()
{
    var info = new RegistrationInfoBuilder("contact@lokad.com", "Lokad").Build();
    var index = new MockUniquenessService();

    return new RegistrationSpec(index)
        {
            Before = {() => index.includes_email("contact@lokad.com")},
            When = new CreateRegistration(reg, info),
            Expect =
                {
                    new RegistrationCreated(reg, info),
                    new RegistrationFailed(reg, new[]
                        {
                            "Email 'contact@lokad.com' is already taken."
                        })
                },
            Finally = index.Clear
        };
}

All was achieved without any special magic or even fancy tools. I've just pulled over sources of SimpleTesting and CompareObjects for additional readability.

For those who are interested in RegistrationSpec class, it is just a simple snippet wiring together dependencies of aggregate root to a strongly-typed specification deriving from TypedSpecification in SimpleTesting:

public sealed class RegistrationSpec : AggregateSpecification<RegistrationId>
{
    public RegistrationSpec(IRegistrationUniquenessService service)
    {
        Factory = (events, observer) =>
            {
                var state = new RegistrationAggregateState(events);
                return new RegistrationAggregate(state, observer, service, 
                    new TestPassword(), 
                    new TestIdentity());
            };
    }
}

Explicit strong-typing of aggregates (as described in bliki) works all the way back in unit test specification by allowing to benefit from compiler-time checking and IntelliSense support. In other words: you don't need to navigate through hundreds of messages to figure out which ones are actually applicable in test.

Saturday
Dec242011

Migrating Legacy Systems to Event Sourcing

These days I'm working on migrating really legacy system towards the simplified CQRS/DDD design with event sourcing for the cloud.

As part of the migration process, I'm reverse engineering legacy SQL database into a stream of events. These events are not precise representation of what has happened in the past (this exact information is irreversibly lost, as in almost any data-driven system), but rather a pretty good estimate that could be used to prepopulate the new version.

Essentially, reverse engineering events is about writing a throw-away utility that will scan database tables (MS Access files or punch-cards) and spit out events that could be used to reproduce that state.

For instance, consider this customer record in DB table:

Customer {
  Name : "GoDaddy",
  Id : SomeGuid,
  Created : 2008-13-12,
  Status : Deleted,
  Phone : "111-22-22",
  Reason : "Supporting SOPA was poor PR move"
}

This record could be reversed into the following events

CustomerCreated!(
  Id: SomeGuid, 
  Name: "GoDaddy", 
  Created: 2008-13-12
)
CustomerPhoneSpecifid!(
  Id: SomeGuid, 
  Phone: "111-22-22"
)
CustomerDeleted!(
  Id: SomeGuid, 
  Reason: "Supporting SOPA was poor PR move", 
  Deleted: 2011-12-24
)

Note, that we actually had to improvise while coming up with this event stream: date of deletion was not stored in the original database (we were losing this information). So we are just substituting some predefined date here (i.e. date of upgrade to CQRS/DDD+ES).

When you have a system with a few years of history, quite a few events are generated. The system that I'm currently migrating has data that dates back to the early dates of Lokad, hence 300-400 thousand events is something expected.

As part of development process, these events are run through the aggregate state objects and also through the projections. The goal here is to pass all possible sanity checks and get read models that match exactly to the UI currently visible in the old system. If new system looks and behaves exactly like the old one (even if the guts are completely simplified), then we are moving in the right direction.

Obviously, during this process, a lot of problems show up, especially with logically inconsistent or corrupt data (i.e. accounting inconsistencies caused by race conditions and dead locks in the legacy database). These things are generally to be resolved manually - there is no magical silver bullet.