CloudHarmony, which is owned by Gartner, monitors the health status of providers by spinning up workload instances in the public cloud and constantly pinging them.
Through 365 days of monitoring last year, CloudHarmony recorded 56 outages at AWS across four major services – virtual compute and storage, plus content delivery network and domain name service – for a total downtime of about two hours and 30 minutes.
By comparison, Microsoft Azure and Google Cloud Platform each had more than five-fold the downtime. Azure experienced 71 outages totaling 10 hours and 49 minutes in services tracked by CloudHarmony, while 167 outages across 11 hours and 34 minutes were recorded in Google’s cloud.
+ MORE AT NETWORK WORLD: A look back and which providers had the best uptime two years ago | The best of CES in picture: The year IoT took over +
CloudHarmony’s methodology does not provide a complete picture of cloud downtime statistics, so the data should be taken with that in mind. For example, CloudHarmony does not monitor all of a cloud provider’s services. It did not record perhaps AWS’s most notable outage of 2015 when its DynamoDB NoSQL database went offline for multiple hours. CloudHarmony spins up instances in single availability zones across multiple regions, so outages that customers experience in an availability zone CloudHarmony is not monitoring would not be recorded.
The big three cloud providers – AWS, Azure and Google – performed relatively well compared to the rest of the market in this limited testing though. In 2015, newcomer Digital Ocean (12.26 hours), long-time managed services vendor Rackspace (12.5 hours) and IBM’s SoftLayer (17 hours) each had worse downtime statistics than Microsoft and Google. One of the worst major cloud providers for downtime was CenturyLink, which had only 15 outages recorded by CloudHarmony -- but they lasted for 31.29 hours. One of the best major cloud providers CloudHarmony found was Joyent, which had only 15 outages totaling less than 34 minutes.
Google Vice President of Reliability Ben Treynor Sloss said the company has exceeded its target of providing 99.95% availability of its cloud. He adds that there’s a discrepancy between CloudHarmony’s data and what customers actually experience. “It doesn’t actually mirror what customers see or tell us,” he said. Many of the major outages covered in the news media are not reflected in the CloudHarmony data, he says. Perhaps CloudHarmony was not monitoring that specific service, or did not have a virtual machine in that portion of the cloud.
Jason Read, founder of CloudHarmony, says the data is not meant to provide a holistic view of cloud availability, but instead be a basic network connectivity test across providers.
Treynor Sloss says there’s an opportunity for more comprehensive data collection across a wider swath of services, which is something Google and a consortium of providers is working on with its PerfKit cross-cloud monitoring tool.
Overall though in recent years, cloud availability has improved. Analysts say despite those improvements and the maturity of the public IaaS cloud, downtime is inevitable. “Outages are a simple fact of life for public cloud providers, even those who tailor their offerings for enterprise customers who are particularly concerned about availability,” says Charles King, principal analyst at Pund-IT.
There are many steps users can take to insulate themselves from cloud downtime, including spreading workloads across data centers in the cloud, or maybe even across providers. King says the risk of downtime is one of the leading reasons for using a hybrid cloud that combines elements of on-premises workloads with those in the public cloud. The benefits of being able to elastically and dynamically spin up and down a potentially massive amount of public cloud resources makes it a trade-off many customers are willing to stomach when using the cloud.