Latest Replies
Monday
Jun252012

Our Ultimate Technological Challenge

// beware: this article is really important for me, but might bore you to death. To make it worse, I will mention Justin Bieber down there.

Our world seems to be in a complete mess these days. A few dozen of grown-up people chasing a ball on a grass field get much more attention than news from war zones on East. Students know more about personal life of Rihanna than about genocide in Rwanda in 1994. And amount of money wasted by Facebook on Instagram is equivalent to total funding of SpaceX company since it's creation till May 2012 (world's first privately held company to send a cargo to the International Space Station).

Yes, (mass murder < pop singer) && (cute photos == space flight).

Given ever accelerating speed of changes, such disparities and problems are only to going to get stronger.

Personally I don't give a damn about musical taste of rich countries, but when kids are dying from starvation and malaria in Africa - this is just not right. This is not the world I was promised back in soviet school and definitely not the world I would want to pass to my own children. Version of Matt Harding is better.

When something does not go as it supposed, you give it your best shot to straighten things out. So, what can we change? Most importantly, how it can be changed?

Let's start with the simple assumption. In order to make things right, we need to change how entire nations think and act: both rich nations with abundance of resources and poor ones. That's the very thing that Gustave Le Bon called "soul of the nation". It was supposed to be extremely hard to alter, nearly impossible. Fortunately, things improved a little bit since then.

As it turns out, it can take as little as a clever excuse and a staged act to change one nation. For example, this was vividly demonstrated on Easter Sunday Parade of 1929 in New York. Edward Louis Bernays was doing a mission for tabacco companies which were upset by loosing huge potential profits to a superstition that women should not smoke cigarettes.

Edward (who later became known as "father of public relations") simply combined ideas of Wilfred Trotter and Gustave Le Bon (with a few other sources of inspiration, including his famous uncle) and simply hired a few girls to start smoking torches of freedom as a protest against sex taboos. Of course, he also tipped local reporters about upcoming feminist protest and hired photographers to make sure, that good and plausible pictures are available. Faster than you know it, the entire USA fell for this act and started discussing. Needless to say, that tobacco companies were happy with the sales in the next years.

"Tools" used in this change only improved since then: we've got internet, social networks and massive spread of smartphones and plain cellphones. These are the things that actually made it possible for the recent revolts in North Africa to take place (if you know Pieter Hintjens, the guy behind ZeroMQ project, check out his guide to Digital revolution), and they keep improving over the exponential curve. If you should know one thing about exponential curves - that's the kind of thing that helped Chernobyl Disaster to happen (in addition to flaw in the design of control rods).

So these days civilization has an enormous potential available at finger-tips of anybody who has access to the internet: starting from limitless power of cloud computing and up to world-wide penetration of mass media, cell phones (and their smarter brothers). Couple this with herd instincts baked into our DNA by thousands of years of evolution, and you have outstanding things taking place: the good, the bad and Justin Bieber.

Combining digital and social resources can have an outstanding effect even in constrained situation. A couple of years ago, I played a small role in helping a few kids to start a social project to support orphans. It wasn't that easy, but MyDreamCity went worldwide and is still running. At the heart of the process was simple enthusiasm supported by some basic IT infrastructure. Take this to a higher scale, and you can probably have a shot in improving a nation.

I'm thinking about two directions that can be exploited: education and efficient use of resources. The former - to make a positive change at the most vulnerable point of any society - kids; the latter - to actually provide some real foundation for these changes to stand upon (money is the blood of our society, as we've learned well by living through the crumbling Soviet Union).

Fortunately, I'm working at a place, where we study optimization of resource consumption at all levels - from high-level organizations and going down to the level of individual households. Plus, there is some limited experience at teaching at my own university, coupled with a little bit of community work in the development field. This is not enough to make a real change right now, but enough to give unsettling feeling that something can really be done to improve the situation.

So, imagine for a second that you have the ability to reach every human being in this world (including these kids in Somali and Zimbabwe), provide inspiration, support, answers to any common question and access to all knowledge of humanity. Don't worry about computing resources and don't worry that much about money. Technology can easily provide all that, especially today and tomorrow.

The question and ultimate technological challenge is: What could be done with all this in order to improve our world within the next 10-20 years?

Monday
Jun252012

What's New in DDD: Functional DDD/ES and Value Objects

Last week, F# CSample by Jérémie Chassaing came out. This is a partial rewrite of Greg's SimpleCQRS in a functional style which honors domain-driven development with event sourcing of aggregates. Here is a sneak peek:

 let remove count s =
    if count <= 0 then raise (InvalidOperationException "cant remove negative count from inventory")
    fire {ItemsRemovedFromInventory.Id = s.Id; Count = count}

Jérémie also wrote a blog post, explaining some of the magic behind the scenes: SimpleCQRS the F# version

Another friend from EU CQRS Beers, Yves Reynhout wrote a blog post on related subject: Value objects in an event sourced domain model.

Old but still a good stuff. Just a reminder, that Eric Evans didn't stop writing after publishing his book. A lot of his effort and experience recently goes into writing articles for Domain Language newsletter, which is available in archives. Besides, in that last DDD eXchange, there was really strategic video on a case study of DDD application.

Monday
Jun182012

What's New in DDD

Just some news coming from the frontiers.

1. DDD eXchange Videos

DDD eXchange took place recently in London. Here are some of the videos recorded in the field and strongly recommended for watching:

See the rest of the videos for some more practical insights from Paul Rayner, Cyrille Martraire and Dan Haywood.

2. IDDD Book and Event Sourcing

IDDD Book by Vaughn Vernon is moving forward, and rough cut ebook will be published in a week or two (fingers crossed). Among other things, it is going to include a fe pages on aggregate design with event sourcing, accompanied by a little bit of sample code. Early preview of that code is already available on github.

Besides the aggregate design, this sample includes first drafts of our new embedded event store for file system and Azure blob storage, based on Riak Bitcask and some experience at Lokad. This code is not production-ready, yet (e.g.: there are known bugs with our usage of Blob leases on Azure).

Although the project also features simple implementation of MySQL and MS SQL event stores, it is not a replacement for event store of Jonathan Oliver or the proper ES server being developed by Gregory Young. It is a sample that tries to help the community by showing core pieces of building an event store and then using it in domain model.

Tuesday
Jun122012

AppHarbor-style Lokad.CQRS Host

Just sharing a weird idea that surfaced earlier.

As you probably know, in addition to numerous improvements, Windows Azure has announced support for git deployments of web sites recently (nice move!). This can reduce development friction noticeably, but is still far from my dream low-friction scenario on AppHarbor, which works like this:

  1. Developer adds new feature, which affects both server-side and UI client. UI changes require completely different read model representations (say, for querying across multiple bounded contexts in a new way) and so new projections are defined as well.
  2. Changes are tested locally using file system abstractions and then are pushed into the master.
  3. Deployment server picks up the changes and begins deployment.
  4. New instance of application server is built and deployed side-by-side with the old one. When server starts up, it detects that projection code has changed and automatically rebuilds persistent read models (in essence, precomputes views). After initialization process is complete - server comes back online and swaps new persistent models in.
  5. New version of web UI is deployed and swapped with the old one as well (via the load balancer managed by deployment server).

This scenario works on AppHarbor, but is not a good fit for Windows Azure, since step 4 is not supported.

While trying to come up with various work-arounds for this limitation, an old idea surfaced under the new look. What if we rewrote deployment server in a way that is hard-wired specifically to Lokad.CQRS projects?

In essence such deployment would be somewhat similar to Lokad.Cloud.AppHost, but with a different twist. Instead of having a generic process as deployment cell unit, we would have bounded context as defined within Lokad.CQRS. Such bounded context can be described via:

  • Application Services (sets of command handlers);
  • View projections (sets of event handlers that update same persistent read model);
  • Tasks (long-running processes);
  • Ports (subscriptions to external events which trigger certain actions in response).

In my projects each bounded context is explicitly described in the code like this:

public static class SomeBoundedContext
{
  public static IEnumerable<object> Projections(IDocumentStore docs)
  {
    yield return new UserIndexProjection(docs.GetWriter<byte, UserIndexLookup>());
    // all other projections
  }
  public static IEnumerable<object> ApplicationServices(
    IDocumentStore docs, IEventStore store)
  {
    // set up some dependencies
    var storage = new NuclearStorage(docs);
    var id = new DomainIdentityGenerator(storage);
    var unique = new UserIndexService(storage);
    var passwords = new PasswordGenerator();

    yield return new UserApplicationService(store);
    yield return new SecurityApplicationService(store, id, passwords, unique);
    yield return new RegistrationApplicationService(store, id, unique, passwords);
    yield return id;
    // all other application services (groups of command handlers)
  }
  // etc for tasks and ports

In essence, each new CQRS/DDD project at Lokad starts by copying latest source code of Lokad.CQRS Sample Project and then starting iterative domain modeling process. Models are always expressed in form of Bounded Context (like the one posted above), optional specification tests and some optional client UI (all this stuff - runnable on a local development machine or server using file system for persistence and messaging). After all is ready for the first spin, code are wired to proper deployment adapters in Lokad.CQRS and pushed to the cloud.

The most boring and tedious part in this routine is actually setting up the infrastructure and Lokad.CQRS building blocks to host the bounded contexts and UI.

So all this got me thinking - what if it would be worth another try to build a .NET DDD Model Host from all that code and experience, which would:

  • provide same Lokad.CQRS infrastructure (including the new version of embedded event-store I'm building for Lokad projects) but as a server host, which is either started locally or in the cloud;
  • Load bounded context definitions and wire them to the message handlers, persistence adapters (i.e. files for local environment and Azure for cloud), message quarantine routines etc
  • Track changes in the code and invoke rebuilds of persistent read models, when necessary.
  • Manage swapping of application services between versions (along with exception handling, timeouts of individual command handlers etc).

Theoretically, this would allow following scenario.

  1. Developer starts a DDD Model Host somewhere.
  2. His TeamCity build server is configured to push bounded contexts (as dlls or nuget packages) to this Host. Essentially, whenever a code is pushed to certain branch, it will be started up in a separate app domain, just like AppHarbor does.
  3. Developer registers some web UIs in this Host. Host then polls them regularly for uptime (like here) and also checks some well-known endpoint for projection code. If new projection code is detected, it will immediately be pulled and rebuilt automatically, by rerunning event streams against this new code. Host will then be responsible for keeping these read models up-to-date.

In essence, among other things, this allows to have multiple thin web clients for various bounded context running against one application server, which would host multiple bounded contexts. Theoretically scaling out rules can also be defined.

There is nothing really new in this (yet another) attempt to make a framework out of Lokad.CQRS. Here are the primary differences from other existing approaches:

  • AppHarbor scenario for Lokad.CQRS deployments: is that we have a server that is strictly tailored to the Lokad.CQRS building blocks (application services, tasks, projections and ports) and relevant cross-cutting concerns. We can have multiple bounded contexts within the same worker, plus keep the ease of deploying same BC to various environments (on-premises and in the cloud) using the same codebase. Plus, we get web console for viewing all the stuff.
  • The only real difference from Lokad AppHost: strong tailoring towards Lokad.CQRS blocks and providing infrastructure (proper persistence and messaging abstractions are injected by the environment).
  • Difference from Lokad.CQRS deployments: bounded contexts are loaded not statically but rather dynamically; it becomes easier to deploy new BCs on an existing host.

Theoretically all this is doable and should reduce development and maintenance friction in my projects (both behavioral and big data deployments). However, from practical perspective it still might be simpler to keep on copying code from Lokad.CQRS and managing deployments manually on individual basis (less configuration code to write and hence less places for bugs to hide).

What do you think?

PS: it seems that Squarespace comments system went crazy recently and might prevent you from posting completely (as opposed to simply accepting comments to moderation queue, which is needed to stop all that spam from coming). If your comments don't show up within a day, please feel free to kick me in the twitter: @abdullin

Friday
Jun082012

Technology Demons

Manuel posted an interesting question to post on Design Observations on Big Data for Retail:

Ok, but if you delete all of these technologies from your design, what technologies you'll use ? and how you substitute them?

The answer is two-fold.

First of all, below a quick list of technologies that I try to avoid at all costs in my projects lately. Only when there are strong external forces, I agree to resort to these demons:

  • SQL Databases (instead: plain files and noDB)
  • NoSQL Databases (instead: plain files and noDB)
  • DTC and anything that requires it (instead: design eventually consistent systems)
  • SOAP and XML (instead: binary formats, JSON and text)
  • Windows Communication Foundation (instead: messaging, HttpListener or sockets via ZeroMQ)
  • IoC Containers (instead: design systems to avoid all need in IoC Containers)
  • WPF and desktop apps in general (instead: HTML5 + CSS + javascript)
  • Windows Workflows Foundation (instead: proper domain-driven design)
  • anything non-Git for distributed version control (instead: git)
  • Aspect Oriented Programming with code weaving (instead: design the software properly)
  • Mocking frameworks (instead: use simple strongly-typed code; Jeremie wrote post)
  • N-tier architectures (instead: shallow systems partitioned along boundaries of bounded contexts)
  • frameworks like log4net, AutoMapper, ELMAH etc (instead: write a few lines of code tailored for your situation).

Second, I don't hold anything against these technologies (except for the cases where tech is being marketed as silver bullet, but that's what demons in all religions are expected to do anyway). I just happen to believe in value that is gained by designing my systems to be independent of these them.

After all, technology should be relevant to the design only when the core problem absolutely necessitates going into this detail. For example, reducing transfer and storage costs via extreme compression of big data or enabling new business scenarios via elastic scalability in cloud).

If however, we are doing something that is not particularly peculiar, then bringing technology to the table (context map) would just complicate everything. I consider to be non-peculiar cases to be, for example, when you have under 100 transactions per second in a single partition, under a few GB of total data for random reads and a few hundred GBs on top for BLOBs - essentially things that you can have deployed at the cost of a few hundred USD per month (including replication and load-balancing). I believe, vast majority of the business scenarios fit this description pretty well.

Yes, this means that vast majority of businesses can easily run on a smartphone (or a cluster of smartphones, if you need continuous replication off-phone)

So, in cases, where tech is not important, why should we couple our designs tightly to the most expensive and complex options among the available ones?

Thursday
Jun072012

Essential Reading on Big Data and Persistence

In my previous post we've discussed some design considerations for handling big data in retail. Let's continue from here.

Joannes Vermorel has just completed a really interesting whitepaper on storing sales data in retail. He outlines a few rather simple principles that allow to store 1 year of detailed sales history of 1000 stores on a smartphone. Both the white paper (PDF) and source code are shared by Lokad on github.

I'm not claiming, that this is a production-ready scenario, since it is missing things like continuous replication (to another smartphone), checksumming and BI capabilities. However the point here is that SQL server or generic No SQL server might not be necessarily be the best fit for this situation.

Curiously enough, in scenarios when companies need to store similar amounts of sales history, they don't take simple and rather cheap approaches like this one. Instead, consultants sell them rather expensive Oracle, Microsoft (put any company in big data field) software and hardware setups that still fail to keep up with the throughput of the data. For some reason, if you can write 50000 ticket receipts per second to a file (where each receipt usually contains a dozen products), this does not necessarily mean that you can have the same throughput inserting rows to your favorite SQL database cluster. So why do we even use them?

I don't hold anything against SQL (or any other relational storage), except the fact that SQL DB is being sold as a silver-bullet for cases, where it is clearly not applicable. And I hate to see huge amounts of money wasted in a useless way (at least, donate them to a charity or noble cause instead).

By the way, check out this great paper by Erik Meijer and Gavin Bierman: A co-Relational Model of Data for Large Shared Data Banks. It provides nice insight into the nature of relational (SQL) and document (Not Only SQL) persistence options.

So why do we keep on applying expensive sub-optimal solutions to problems that do not fit them? Probably, because "nobody get fired for buying IBM", while trying some non-conventional approach and failing is more risky to your career.

However this will not necessarily hold true in the next years. Economic and technology forces are too strong. Just read this amazing white paper from Pat Helland, which was written way back in 2007 (and don't get surprised if you find a lot of things that look like modern principles behind event sourcing and domain-driven design).

I do not intend to criticize SQL databases or any other product, but rather to give broader perspective - they are not the only data persistence solutions out there. There are more options. And sometimes, a few specialized lines of code can beat a generic product both hands down (simply because they can be more tailored to the problem, than a product would ever dream to be).

Tuesday
Jun052012

Design Observations on Big Data for Retail

Change of technologies and approaches tends to bring a lot of challenges and problems (which eventually turn into "lessons learned"). This is especially true, when you probe paths that are not common.

Curiously enough, as Charles de Gaulle once noted, such less common paths are also the ones where you are likely to encounter much less competition.

At the moment of writing, one of current projects at Lokad is about rewrite of our Salescast product, which is a cloud-based business intelligence platform for retail (see tech case study).

This rewrite features better design which captures core business concepts at a deeper level. This allows to achieve simpler implementation, better cloud affinity and scalability, while discarding such technologies like IoC Container, SQL and NHibernate ORM.

If you are interested in reasons for discarding these technologies: SQL - too expensive and complex for dealing with bit data in cloud; ORM - complex and unneeded; IoC Container - I prefer simple designs that don't need it. Obviously such mess as WCF, WWF, Dynamic Proxies, AOP, MSMQ etc - are also something I try to avoid at all costs.

One of the side effects is that this system no longer needs complex setups for local development: message queues, event stores, documents, BLOBs and persistent read models are stored in files.

We are using event sourcing for the behavioral elements of the system, while "big data" number crunching is based on a different approach.

This approach has an interesting side effect that I didn't expect.

If anybody in the team discovers a problem in some complex data processing pipe (or any other logic, including business rules, map-reduce step, report generation etc), with exception bubbling up, then in order to reproduce the exact state of the system on a different machine:

  • Stop the solution.
  • Archive data folder and send it to responsible person faulty for the problem (usually me).
  • Responsible person unarchives data folder and starts the solution.
  • Exception will bubble up.

You see, when exception bubbles up in the development environment, the message still remains in the message queue (as a file in a folder). So when we transfer all data to another machine and start the solution - system will try to pick that same message up and reprocess it. Since all data dependencies are included in the data folder, this will lead to the same exception showing up.

Obviously, production deployment of such system is quite different (using cloud-specific implementations for data storage, messaging and event streams), yet principles would still work. This happens because I mostly store data either in append-only structures (BLOBs for large data and event streams for behavioral domain models) or this data is irrelevant (persistent read models that are automatically rebuilt from event streams).

I'm using Lokad.CQRS Sample Project as a baseline for developing this and similar systems.

Here are a few more technology-specific observations:

  • TSV + GZIP is quite good for storing large non-structured streams of data in table form and with little effort (plus, you don't need any tools to view and check such data);
  • When you need decent performance while storing sequences of complex structures with little effort (e.g. sequence of object graphs), then Google Protocol Buffers (prefix-based serialization) offer a fast approach (wrap it with GZIP and SHA1, if there are repetitive strings);
  • when it is worth a few days to optimize storage and processing of big data to insane levels (e.g.: for permanent storage), then some custom case-specific serialization and compression algorithm can do magic (rule of thumb: this might be needed only in 1 or 2 places);
  • do not optimize till it is really necessary; quite often you can save massive amount of time by avoiding optimization and simply using a bigger virtual machine on the cloud (which is cheaper);
  • whenever possible stream big data through memory, as opposed to loading huge datasets entirely. You'll be surprised how much data your small machines will be able to process;

You don't need expensive licenses and hardware (e.g. Oracle, IBM, Microsoft setups usually offered by consultants) to store and process thousands of stores with years of sales history. Likewise, you don't need large teams or big budgets to get the thing ready and delivered. A lot of that can be avoided with the appropriate design. Especially, if that design factors in not only technological and organizational factors, but also shares affinity with business model of a company.

Page 1 ... 5 6 7 8 9 ... 72 Next 7 Entries »