Thursday, December 15, 2016

Azure Virtual Machine - Architecture


Microsoft Azure is built on Microsoft’s definition of commodity infrastructure. The most intriguing part of Azure is its cloud operating system that is at its heart. During the initial days of azure when it started it stated using fork of windows as its underlying platform Back then they named it as red dog operating system & red dog hypervisor. If you go into the history of Azure the project which became azure was originally named as project red dog. David Cutler was the brain behind designing and developing the various Red Dog core components and it was he who gave this name.in his own words- the premises of Red Dog (RD) is being able to share a single compute node across several properties. This enables better utilization of compute resources and the flexibility to move capacity as properties are added, deleted, and need more or less compute power. This is turn drives down capital and operational expenses.


It was actually a custom version of windows and the driving reason for this customization was because hyper v during those didn’t had the features which was needed for Azure (particularly support for booting from VHD). if you try to understand the main components of its architecture we can count four pillars-

  1. Fabric Controller
  2. Storage
  3. Integrated Development Tools and Emulated Execution Environment
  4. OS and Hypervisor
Those were initial (early 2006) days of azure as it matured running a fork of an OS is not ideal (in terms of cost and complexity), so Azure team talked to the Windows team, and efforts were made to use Windows itself. As time passed windows eventually caught up and now Azure runs on Windows.



Azure Fabric Controller

Among there one component which contributed immensely in its success is fabric controller. The fabric controller owns all the resources in the entire cloud and runs on a subset of nodes in a durable cluster. It manages the placement, provisioning, updating, patching, capacity, load balancing, and scale out of nodes in the cloud all without any operational intervention.

Fabric Controller which still is backbone of azure compute is the kernel of the Microsoft Azure cloud operating system. Azure Fabric Controller regulates the creation, provisioning, de-provisioning and supervising of all the virtual machines and their back-end physical server. In other words It provisions, stores, delivers, monitors and commands the virtual machines (VMs) and physical servers that make up Azure. One added benefit is that It also detects and responds to both software and hardware failure automatically.


Patch Management

When we try to understand the underlying mechanism/workflow which Microsoft follows for patch management the common misconception is that it keeps updating all the nodes just like we do in our environment. But things in cloud is little different, AS Azure hosts are image-based (hosts boot from VHD) and it follows the image based deployment. So instead of just having patches delivered, azure roll out new VHD of the host operating system. Means they are not actually going and patching everyone but instead azure update at one place and because its orchestrated update it can use this image to update the whole environment.

This offers a major advantage in host maintenance as the volume itself can be replaced, enabling quick rollback. Host updates role out every few weeks (4-6 weeks), with an approach where updates are well-tested before they are rolled out broadly to the data centers. It’s the responsibility of Microsoft to ensure that each roll out is tested before updating the data center servers. To do so they start this implementation with few fabric controller stamps which could be called as pilot cluster and then once through they will gradually push the updated to production (Data Center) hosts. The underlying technology behind this is called Update Domain (UDs). When you create VM’s and put them in an availability set they get bucketed into update domain (by default you get 5 but there are provisions to increase them to 20). So, all the VMs part of availability set will get distributed equally among these UDs. With this the patching will take place in batches and Microsoft will ensure that at a time only single update domain should go for patching. You can call this as staged rollout. To understand this in more detail let’s see how Fabric controller manages the partitioning-


Partitioning

Under Azure’s Fabric Controller it has two types of partitions: Update Domains(UDs) and Fault Domains(FDs). These two are responsible for not only high availability for also for resiliency of infrastructure with this in place in empowers the Azure with ability to recover from failures and continue to function. It's not about avoiding failures, but responding to failures in a way that avoids downtime or data loss.

Update Domain: An Update Domain is used to upgrade a service’s role instances in groups. Azure deploys service instances into multiple update domains. For an in-place update, the FC brings down all the instances in one update domain, updates them, and then restarts them before moving to the next update domain. This approach prevents the entire service from being unavailable during the update process.

Fault Domain: Fault Domain defines potential points of hardware or network failure. For any role with more than one instance, the FC ensures that the instances are distributed across multiple fault domains, in order to prevent isolated hardware failures from disrupting service. All exposure to server and cluster failure in Azure is governed by fault domains.


Azure Compute Stamp

As in Azure, things gets divided into stamps where each stamp will have one fabric controller and this fabric controller is the one responsible for managing the VMs inside that stamp. In Azure, there are only two type of stamps, it could either be compute stamp or storage stamp. This Fabric controller is also not single; it has its distributed branches. Based on the available information, azure will have 5 replicas of the fabric controller where it uses synchronous mechanism to replicate the state. In this setup, there will be one primary and to which control pane will talk to. Now it’s the responsibility of this primary to act on the instruction (example- provision a VM) and also let other replicas know about it. And when at least 3 of them acknowledge the fact that this operation is going to happen then the operation take place (this is called quorum based approach).

VM Availability

Talking about Azure Virtual Machines there are three major components (Compute, Storage, Networking) which constitute Azure VM.While discussing Azure Virtual Machine (VM) resiliency with customers, they typically assume it is comparable to their on-prem VM architecture and as such, features from on-prem is expected in Azure. Well it is not the case, thus I wanted to put this together to provide more clarity on the VM construct in Azure to better understand how VM availability in Azure is typically more resilient then most on-prem configuration.
"Talking about Azure Virtual Machines there are three major components (Compute, Storage, Networking) which constitute Azure VM. So, when we talk about Virtual machine in Azure we must take two dependencies into consideration. Windows Azure Compute (to run the VM’s), and Windows Azure Storage (to persist the state of those VM’s). What this means is that you don’t have a single SLA, instead you actually have two SLA’s. And as such, they need to be aggregated since a failure in either, could render your service temporarily unavailable."
Under this article lets have our discussions on Compute(VM) and Storage components.

Azure Storage:You can check my other article where I have talked about this in great details, on how an Azure Storage Stamp is a cluster of servers hosted in Azure Datacenter. These Stamps follows layer architecture with built-in redundancy to provide High Availability. Under this multiple (most of the times 3) replicas of each file, referred as Extent, are maintained on multiple different servers partitioned between Update Domains and Fault Domains. Each write operations are performed Synchronously (till we are talking about intra Stamp replication) and control is returned only after the 3 copies completed the write, thus making the write operation strongly consistent.

Virtual Machine:


Microsoft Azure has provided a means to detect health of virtual machines running on the platform and to perform auto-recovery of those virtual machines should they ever fail. This process of auto-recovery is referred to as “Service Healing”, as it is a means of “healing” your service instances. In this case, Virtual Machines and the Hypervisor physical hosts are monitored and managed by the Fabric Controller. The Fabric Controller has the ability to detect failures.

It can perform the detection in two mode-Reactive and Proactive. If the FC detects failures in reactive mode (Heartbeats missing) or proactive mode (known situations leading to a failure) from a VM or a hypervisor host, it will initiate a recovery by either redeploying the VM on a healthy host (same host or another host) and mark the failed resource as unhealthy and remove it from the rotation for further diagnosis. This process is also known as Self-Healing or Auto Recovery.
With Above diagram we can see different layers of the system where faults can occur and the health checks that Azure performs to detect them

*auto-recovery mechanism is enabled and available on virtual machines across all the different VM sizes and offerings, across all Azure regions and datacenters.