Latest Replies
Wednesday
Jul282010

Lokad CQRS - Debugging Service Bus Failures

In the previous post on Lokad CQRS project we've covered usage of Protocol Buffers as the recommended serialization for your service bus messages. Let's continue the discussion with debugging these messages and keeping an eye on our workers in general.

It is hard to beat Protocol Buffers serialization in performance, size, cross-platform and evolution-capabilities at once. That's why Google uses them extensively for message serialization and persistence.

However, people tend to have concerns about this format: "Humans can't decipher messages by opening them in the notepad, so it's too complex." This reason tends to negate all the benefits.

Well, you generally should not be able to read messages in notepad, in the first place (security concerns). However, should you really need this (in case of exceptions, for example) it is trivial to print any ProtoBuf in human-readable format:

PingPongCommand
{
  "Ball": 161,
  "Game": "My Game"
}

Let's dig into Sample-04 from Lokad.CQRS Guidance.

This sample is a PingPong implementation based on Sample-01. It brings along a few improvements. They include ProtoBuf serialization (which has been covered in the previous article), ability to record failing messages and a few more features to talk about later.

As you know, Message Handler Feature (which implements core Service Bus functionality within Lokad.CQRS App Engine), is capable of retrying messages that fail. If message fails continuously (current default value is 4 retries), then we consider it to be poison and move it to the corresponding queue, to keep processing healthy. Poison queues are auto-created, just like any other.

We demonstrate this behavior by throwing exceptions at random. The ping-pong bouncing will never stop, unless we get really unlucky (meaning 4 exceptions in a row with probability of 17% each). In the latter case failing message will go to the "samples-04-poison" queue.

In order to fix the problem post-mortem, we need to know the exception details. For example, we could instruct our App Engine to print message processing details to the Azure blob storage. Configuration is trivial:

// we'll handle all messages incoming to this queue
builder.HandleMessages(mc =>
{
  // .... configuration skipped ....      

  // let's record failures to the specified blob 
  // container using the pretty printer
  mc.LogExceptionsToBlob(c => {});
});

This will create text file with exception details on every message that failed for this handler. It will be located in "errors" blob with the content like:

PingPongCommand

  Topic          : PingPongCommand
  ContractName   : PingPongCommand
  Sender         : http://127.0.0.1:10001/devstoreaccount1/sample-04
  Identity       : 1ff9a799-f442-4d27-abd8-9dc100cc851c
  CreatedUtc     : 07/28/2010 12:24:38

Exception
=========
Type: Sample_04.Worker.BounceFailedException
Message: Bouncing failed for: Ping #161 in game 'ea'.
Source: Lokad.Cqrs.Stack
TargetSite: Void InvokeConsume(System.Object, System.Object, System.String)
StackTrace:

  // stack trace skipped

This text includes message attributes (system information persisted by Lokad.CQRS in Lokad Message Format) and exception details with the preserved stack trace. Files are given names in the pattern of (date-azure-message-id):

errors/2010-07-28-12-24-38-1ff9a799-f442-4d27-abd8-9dc100cc851c.txt

We can tune the exception details to be more friendly, through:

mc.LogExceptionsToBlob(c =>
  {
    c.ContainerName = "sample-04-errors";
    c.WithTextAppender(RenderAdditionalContent);
  });

First statement provides custom container name, second - appends custom content renderer, using JSON.NET library. This renderer might look like:

static void RenderAdditionalContent(UnpackedMessage message, 
  Exception exception, TextWriter builder)
{
  builder.WriteLine("Content");
  builder.WriteLine("=======");
  builder.WriteLine(message.ContractType.Name);
  try
  {
    // we'll use JSON serializer for printing messages nicely
    var text = JsonConvert.SerializeObject(message.Content, Formatting.Indented);
    builder.WriteLine(text);
  }
  catch (Exception ex)
  {
    builder.WriteLine(ex.ToString());
  }
}

This will include actual message details into the text file:

Content
=======
PingPongCommand
{
  "Ball": 161,
  "Game": "ea"
}

You can use something like Open Source Azure Storage Explorer to keep an eye on message handling failures.

Azure Storage Explorer

As soon as we discover new error, we could investigate the problem, fix and deploy new version. If we move poison message back to the processing queue - the process will resume where it has failed. This simplifies debugging and fixing long-running server-side processes.

Also, if the problem was caused by some connectivity issue or DB failure, that are known to be fixed, then there is even no need to debug or deploy - just resend the poisons.

However, manually checking your message queues for the problems is not really the best way. It takes away precious time, focus and does not guarantee real-time response. We would need something more flexible and efficient.

We'll discuss that in the next article in Lokad CQRS Guidance series. Meanwhile, you can download the samples and subscribe to the updates.

Dear Reader, what do you think about that?

« Lokad CQRS - V1 for Windows Azure in September 2010 | Main | Lokad CQRS - Using Protocol Buffers Serialization for Azure Messages »