PopYard:Today's Tech.-Which is less expensive: Amazon or self-hosted?

Thu Nov 28 21:46:25 2024

Which is less expensive: Amazon or self-hosted?
Source: Charlie Oppenheimer

Amazon Web Services (AWS), as the trailblazing provider of Infrastructure as a Service (IaaS), has changed the dialog about computing infrastructure. Today, instead of simply assuming that you’ll be buying and operating your own servers, storage and networking, AWS is always an option to consider, and for many new businesses, it’s simply the default choice.
I’m a huge fan of cloud computing in general and AWS in particular. But I’ve long had an instinct that the economics of the choice between self-hosted and cloud provider had more texture to it than the patently attractive sounding “10 cents an hour,” particularly as a function of demand distribution. As a case in point, Zynga has made it known that for economic reasons, they now use their own infrastructure for baseline loads and use Amazon for peaks and variable loads surrounding new game introductions.

An analysis of the load profiles

To tease out a more nuanced view of the economics, I’ve built a detailed Excel model that analyzes the relative costs and sensitivities of AWS versus self-hosted in the context of different load profiles. By “load profiles,” I mean the distribution of demand over the day/month as well as relative needs for bandwidth versus compute resources. The load profile is the key factor influencing the economic choice because it determines what resources are required and how heavily these resources are utilized.
The model provides a simple way to analyze various load profiles and allows one to skew the load between bandwidth-heavy, compute-heavy or any combination. In addition, the model presents the cost of operating 100 percent on AWS, 100 percent self-hosted as well as all hybrid mixes in between.
In a subsequent post, I will share the model and describe how you can use it for scenarios of interest to you. But for this post, I will outline some of the conclusions that I’ve derived from looking at many different scenarios. In most cases, the analysis illustrates why intuition is right (for example, that a highly variable compute load is a slam dunk for AWS). In other cases, certain high-sensitivity factors become evident and drive the economic answer. There are also cases where a hybrid infrastructure is at least worthy of consideration.

To frame an example analysis, here is the daily distribution of a typical Internet application. In the model, traffic distribution is an input from which bandwidth requirements are computed. The distribution over the day reflects the behavior of the user base (in this case, one with a high U.S. business-hour activity peak). Computing load is assumed to follow traffic according to a linear relationship, i.e. higher traffic implies higher compute load.
Note that while labor costs are included in the model, I am leaving them out of this example for simplicity. Because labor is a mostly fixed cost for each alternative, it will tend not to impact the relative comparison of the two alternatives. Rather, it will impact where the actual break-even point lies. If you use the model to examine your own situation, then of course I would recommend including the labor costs on each side.

For this example, to compute costs for Amazon, I have assumed Standard Extra Large instances and ELB load balancer for the Northern California region. The model computes the number of instances required for each hour of the day. Whenever the economics dictate it, the model applies as many AWS Reserved Instances (capacity contracts with lower variable costs) as justified and fills in with on-demand instances as required. Charges for data are computed according to the progressive pricing schedule that Amazon publishes. To compute costs for self-hosting, I assume co-location with the peak number of Std-XL-equivalent servers required, each loaded to no more than 80 percent of capacity. The costs of hardware are amortized over 36 months. Power is assumed to be included with rackspace fees. Bandwidth is assumed to be obtained on a 95th percentile price basis.
Now let’s look at a sensitivity analysis. Notice in the above example, that a bit more than half of the total cost for each alternative is for bandwidth/data transfer charges ($35,144 for self-hosted at $8/Mbps and $36,900 for AWS). This is important because while Amazon pricing is fixed and published, 95th percentile pricing is highly variable and competitive

The chart above shows total costs as a function of co-location bandwidth pricing. AWS costs are independent of this and thus flat. What this chart shows is that self-hosting costs less for any bandwidth pricing under about $9.50 per Mbps/Month. And if you can negotiate a price as low as $4, you’d be saving more than 40 percent to self-host. I’ll leave discussion of the hybrid to another post.

This should provide a bit of a feel for how I’ve been conducting these analyses. Above is a visual summary of how different scenarios tend to shake out. The intuitive conclusion that the more spiky the load, the better the economics of the AWS on-demand solution is confirmed. And similarly, the flatter or less variable the load distribution, the more self-hosting appears to make sense. And if you’ve got a situation that uses a lot of bandwidth, you need to look more closely at potential self-hosted savings that could be feasible with negotiated bandwidth reductions.

}