Measuring the Costs and Mitigating the Risks of Cloud Downti Source: Jeff Kaplan
THINKstrategies Jeff Kaplan comments on the latest set of AWS outages and the new round of debate regarding the reliability of cloud services and relative costs of downtime.
Today’s brief disruption of Amazon Web Services (AWS) coming on the heels of last week’s longer outage may have been minor in comparison to the extended problems which occurred last year, but they will still spark a new round of debate regarding the reliability of Cloud services and relative costs of downtime.    These issues will also bring renewed attention to preventive measures IT folks can take to handle Cloud service disruptions of any scale.
By coincidence, around the same time that AWS suffered its outage, the International Working Group on Cloud Computing Resiliency (IWGCR) released its first Availability Ranking of World Cloud Computing (ARWC) report which stated that,
        “The average unavailability of Cloud services is estimated to 10 hours per year or more. Average availability is estimated to 99.9% or less.”
The IWGCR’s estimation method relies on public press reports of Cloud incidents. Based on a total of 568 hours of downtime attributed to thirteen (13) publicly reported cloud service outages since 2007, the IWGCR estimates the economic impact to be more than $71.7 million.
Making these kinds of calculations has always been a tenuous process. Before the idea of the Cloud became popularized, the cost of network downtime was the subject of similar debate. As in the previous arguments about network downtime, the IWGCR’s method for calculating its costs of downtime figure is a bit arbitrary, using a series of reports about the financial impact of specific Cloud outages on particular businesses to serve as the basis of its estimate of the overall impact of the outages on a broader user base even though the report doesn’t take into account the total number of users affected by an outage.
Regardless of the random reference points utilized by the IWGCR to build its cost of downtime calculations, its conclusions do serve as an effective warning of the detrimental financial effect of Cloud service disruptions on business customers.
Newvem’s monitoring service has been tracking and reporting on the usage patterns of AWS customers in minute detail and has discovered a number of disturbing behaviors which are putting many businesses at risk when a service disruption occurs.
For instance, Newvem has found that only 60% of AWS users back up all of their Elastic Block Storage (EBS) volumes. Put another way, 40% of AWS users’ Cloud data, applications and infrastructure are not backed up and vulnerable to a service outage.    As a result, many AWS users spent more than 5 hours after the most recent outage trying to reconfigure their servers.
Newvem has also identified many cases in which AWS users have not properly configured their EBS deployments and are not achieving maximum utilization levels. Ironically, 20% of heavy users have instances behind their Elastic Load Balancers (ELBs) which have the highest level exposure for data loss and potential downtime suffrage, and cannot distribute traffic and data from these instances to other ELBs. Not surprisingly, 27% of first-time AWS customers are not configuring their ELBs effectively and are not able to achieve optimal availability and performance.
Newvem goes a step further by highlighting preventative measures and actions that can reduce exposure during an outage in their dashboard.    Corrective actions are either through how to guides, tutorials or the introduction of an expert to manually fix such a problem.    In the future handoffs will be made to solution provider who can address this in an automated manner, while users can literally see in their dashboard whether the problem attended to or solved.
I’ve always been a big believer that today’s Cloud services can offer far better availability and performance at a much lower cost than traditional data center systems and software if properly procured and configured. But, the truth is that most organizations lack the skills and experience to buy and deploy these services to attain their maximum potential.
That is what makes a new generation of Cloud analysis tools, like Newvem, essential to help organizations evade and even solve the potential pitfalls associated with today’s evolving Cloud services and garner greater value from these services. In fact, Newvem has become a AWS Technology Partner and will be able to help AWS users gain even greater insights about how to maximize the value of their Cloud services while mitigating the risks and needless costs.
I am also teaming with Newvem to bring attention to timely data from their CloudRadar service which can shed new light on how to capitalize on today’s Cloud services.
| }
|