Windows Azure Storage Can Be Confusing

I'm currently working on some blob storage abstractions needed for the Lokad.CQRS project. This involves writing some unit tests, which happen to be producing really strange results, while using conditional headers for the BLOB operations.

Conditional headers are part of HTTP RFC:

  • if-match
  • if-modified-since
  • if-none-match
  • if-unmodified-since

These headers are really important for implementing efficient storage operations (i.e.: caching large blobs locally) and performing reliable atomic updates (when write operation checks, if record was updated since the read).

Azure Blob Storage Rest API supports conditional headers. .NET Storage Client supports them as well in form of BlobRequestOptions that could be passed to methods.

That's the theory. In practice things get really confusing and tend to waste your day. Let's examine a single method OpenRead, which opens a stream for reading blob's contents. Look at this snippet:

var options = new BlobRequestOptions()
  AccessCondition = AccessCondition.IfMatch(cachedTag)

using (var stream = _blob.OpenRead(options))

What would you expect the outcome to be? Documentation does not say anything special about the behavior of the BlobRequestOptions passed to the OpenRead.

Here's how it works on my machine:

  • If item is stored properly in Azure Blob, then:
    • IfUnmodifiedSince results in exception, which makes sense.
    • IfModifiedSince is ignored (which might be step away from what is defined in RFC)
  • if blob (or the container) does not exist, then:
    • IfNoneMatch with non-existent ETag results in 404 (Not Found)
    • IfMatch with non-existent ETag results in 412 (precondition failed), which is a step away from the RFC.

Now, since .NET documentation does not help us much, we could do some debugging and figure out the actual REST operations being performed underneath. This leads us to understanding that OpenRead, among the many other things, calls Get Block List method. Documentation says:

This operation also supports the use of conditional headers to read the blob only if a specified condition is met. For more information, see Specifying Conditional Headers for Blob Service Operations.

However, if we look at the Operations Supporting Conditional Headers, then GetBlockList operation is not even listed there.

So we've got a few potential problems here:

  • something could be completely wrong with my machine, producing constantly misleading results;
  • REST API documentation for Windows Azure Blob Storage might be a bit outdated and confusing;
  • Azure Dev Storage might produce really weird results depending on the type of the header passed;
  • .NET documentation for the StorageClient does not say a word about how methods are in fact supposed to work.

And that's just a single method; there are more. I've started creating questions on MSDN forum, but quickly gave up, since the next step should've been debugging into the server-side API implementation)).

Update: it gets even more fun. Here's how a simple unit test suite for a single method (wrapper around blob reading) looks on the development fabric:

Now if we switch the credentials to use real Windows Azure Fabric:

As you can see, Windows Azure Dev Storage and Production Storage have behavior that differs. This should be accounted for, while developing and deploying applications (ensuring that the proper retry policies and delays are applied to give production storage some time for processing the operations like recreating container with the same name).

Hopefully Microsoft will clear up the situation. Meanwhile, it's recommended to make sure to debug and double check every single method. Or, as L.M.Bujold has said:

Check your assumptions. In fact, check your assumptions at the door.

- by .