Software Design Blog

Journey of Rinat Abdullin

Windows Azure Stuck in Initializing - Try Different VM Size

Sometimes your windows Azure deployment can get wrong and will stay initialization forever, no matter what. Redeploying will lead to the same effect.

Long story short. When such thing happens, first, check for the common problems, then contact Windows Azure Support. They will ask for at least a few hours to investigate. While waiting - try deploying with a different VM size. The latter might save you a few hours of lost profits in the system downtime.

The problem (this is just my educated guess) could be rooted in the fact that Azure OS is still a Windows-derived VM managed by a Hyper-V environment. There are a lot of things that can go wrong. The probabilities of that are quite low, yet given the law of large numbers you just might get really unlucky and be stuck with a corrupted VM. Redeploying with a different VM Size forces Windows Azure to pick a different VM.

I wish I were aware about this possibility yesterday, before wasting the entire evening on debugging in a rush of a perfectly valid deployment. This had to be resolved ASAP just to bring the production service back online.

BTW, volatility is the nature of of the cloud, so get used to it. It is to be expected, although might feel a bit different from more reliable and controlled on-premises data-center environments with highly redundant hardware and manually controlled changes in the configuration.

If you are building for the cloud, handling error conditions should be built into the systems at their core. They should be self-healing, error-resolving and gracefully degrading little SOBs that are capable to withstand as much beating on their own as possible. This is much easier to do than it sounds and is definitely easier than building complex N-Tiered system designed to run only in a tight cluster in-premises.

The good thing is that a system designed for the cloud will live perfectly in on-premises environment, but will require much cheaper hardware and will provide better scalability and SLAs at lower costs.