How to build a High Available PaaS/IaaS platform on Microsoft Azure ?

With cloud hegemony now most of our customers want real resilient infrastructures even on Cloud platform. Indeed, they need to be assured, like they were on premise, that their businesses services are up and running. In addition to that they want to be able to scale.. because this is the cloud promise: start small and grow fast !

In this blogpost, I’ll treat a simple scenario with 3 web sites and a SQL Server database. The SQL Server database is pretty big… 3TB.

So what is a HA platform and which Azure solution behind ?

Geographically replicated

The services must remain available even if a place crash on a datacenter, so they have to be located on multiple locations. Using PaaS services can help a lot because this is something really easy to set, on storage you have options to replicate datas across multiple datacenters, even have some read only replica. For our SQL Server we’ll use Always On solution.

BRP (Business Recovery Plan) vs BCP (Business Continuity Plan)

Each option has advantages, but if we speak about High Availability only BCP will gain our attention. BCP means that in case of a crash, nothing should happens. Take a look in deep… In addition of every single services replicated and everything is backup, it means that sessions must be redirected and data synchronized between the two locations.

Data Synchronization

One of the most expensive stuff in your on premise datacenter is data storage. When we speak about active/active replicated storage, my customers start being blank and say: ”Oh, maybe it’s not needed in fact, we can use some magic to do this no ?”. Azure comes with the handy File Storage :°)

Cross replicated network

Yes, Network is important as well, our services must be accessible from everywhere as well as on premise and 24/7. Here again Azure offers solutions Network Peering, VPN Gateway and/or Express Routes. It depends of your need in flexibility and activity.

Web Sites access

This may be the most important stuff right there… The customer access to our business website. A combination of Application Gateway and Traffic Manager can guarantee the traffic will be load balanced and the sessions will be managed correctly between our two web site locally separated.

Let’s make a schema to explain it !

Some bullet points to explain this infrastructure:

  • Both networks are accessible by Express Route. Everything is routable from inside the customer network, so if we lose one link: no failure. If we lose one datacenter: no failure.
  • Both regions are connected by a VPN Ultra performance in order to replicate data with our SQL Server Always On.
  • If the VPN became inactive between the two regions, the region with the SQL Server active node will remains because this is the only web site available through Traffic Manager Endpoints monitoring.
  • If a SQL Server VM fails, if it’s the primary node, every single request is relocated to the second node that has become active.
  • SQL Server VMs are built on an Availability Group and are using SSD storage in order to guarantee the 99,95% SLA.
  • Web App are accessed by Traffic Manager which enable the load balancing between two Azure regions, we’ll choose the performance method in order to guarantee an active/active routing.
  • The application gateway is here for the SSL offloading as well as maintaining active sessions to the web site.
  • Web App are stored in App Service Environment, we can thanks to that isolated the web app in a DMZ looks like network, and the Web App can access the SQL Server Always On.

 

Thanks to that our services does not have SPOF.