Home » Simulation 🌟 AI Research · Newsletter · ML Labs · About

Simulate CQRS/ES Cluster

This is work-in-progress on running event-driven distributed systems inside discrete event simulation.

The purpose of this simulation research - to be able to run a distributed application inside a deterministic simulation while bombarding it with various faults that are hard to reproduce in the real world (but are still disruptive for the production systems).

This project builds upon the Simulate with Async project, extending it with more features (including a simplified networking stack).

Try it out

The project is available on github.

This is a .NET Core 2.0. You should be able to open it in a IDE (e.g. in JetBrains Rider) and run Runtime/SimMach.csproj project.

The output should be something like this:

sim-cluster.png

Alternatively, you could try launching everything from the CLI with something like:

$ dotnet run --project Runtime

Details

This project builds up on the previous steps and introduces:

  • Simplified simulation of TCP/IP. This includes connection handshake, SEQ/ACK numbers and reorder buffers. There is now proper shutdown sequence and no packet re-transmissions.
  • Durable node storage in form of per-machine folders used by the LMDB database.
  • Configurable system topology - machines, services and network connections.
  • Simulation plans that specify how we want to run the simulated topology. This includes a graceful chaos monkey.
  • Simulating power outages by erasing future for the affected systems.
  • Network profiles - ability to configure latency, packet loss ratio and logging per network connection.

Dive in

To dive in take a look at the Program.cs. It generates a simulation scenario that is then executed.

A scenario could look like this:

public static ScenarioDef InventoryMoverBotOver3GConnection() {
    var test = new ScenarioDef();
    // define network connections and provide network profiles for them
    test.Connect("botnet", "public", NetworkProfile.Mobile3G);
    test.Connect("public", "internal", NetworkProfile.AzureIntranet);
    // install services on the machines
    test.AddService("cl.internal", InstallCommitLog);
    test.AddService("api1.public", InstallBackend("cl.internal"));
    test.AddService("api2.public", InstallBackend("cl.internal"));
    // configure a bot that will create workload and verify results 
    var mover = new InventoryMoverBot {
        Servers = new []{"api1.public", "api2.public"},
        RingSize = 7,
        Iterations = 30,
        Delay = 4.Sec(),
        HaltOnCompletion = true
    };

    test.AddBot(mover);

    // define a plan for the simulation (who will control the machines)
    // this is optional, but a chaos monkey is cute...
    var monkey = new GracefulChaosMonkey {
        ApplyToMachines = s => s.StartsWith("api"),
        DelayBetweenStrikes = r => r.Next(5,10).Sec()
    };
    test.Plan = monkey.Run;
    return test;
}

Installer functions bring together the necessary dependencies and return an instance of IEngine:

static Func<IEnv, IEngine> InstallBackend(string cl) {
    return env => {
        var client = new CommitLogClient(env, cl + ":443");
        return new BackendServer(env, 443, client);
    };
}
static IEngine InstallCommitLog(IEnv env) {
    return new CommitLogServer(env, 443);
}

BackendServer is a simplistic event-driven server that has its own projection thread and a (command) request handler. It commits data to the CommitLog from which other server instances could get the same data.

In theory, the same business logic should be able to run in the real world environment as well.

Next post in Simulation story: Logistic Simulation

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out