Enterprise Hosts in Glass Houses Shouldn’t Throw SLAs

– Adam Stern, founder and CEO of Infinitely Virtual, says:

Signing on with the industry’s largest players – the companies with the vast networks, the far-flung data centers, the affections of Wall Street and the brand recognition that so many smaller organizations covet – is both logical and, sad to say, sometimes not so much.

It’s a trust-but-don’t-bother-to-verify situation, since it’s assumed the mega providers are too big to fail. Except they’re not. Here, however, it’s not a question of a provider going out of business, but of going dark and going down. So while their businesses may not fail, yours might. An ill-timed outage could do some serious damage.

In the previous century, no one ever got fired for buying IBM, but that was then. Suffice to say, today’s strategic IT purchase decisions need to be informed by some level of risk aversion and caveat emptor.  No matter how large the provider’s market cap.

Writing in Resources, the Stratacore blog (http://resources.stratacore.com/h/i/40014554-cloud-closed-a-rundown-of-2014-shutdowns), Lee Pallat chronicles some of 2014’s greatest hits – that is, misses – observing that “even the largest providers are susceptible to poor planning, poor engineering, unknown bugs, and malicious miscreants.”  Looking at 2015, he notes that “while the technical issues can be prevented, we expect to see an increase in hacker caused outages as the tools to initiate these cyberattacks become widespread and they become a preferred method of social protest and nation-state domination.”  To recap:

  • Dropbox file sharing fail, January 2014: A scripting glitch caused OS upgrades to be applied to active machines during a routine maintenance. Restoring from backup took two days.
  • Basecamp DDos attack, March 2014: Project management service Basecamp suffered a DDoS attack that took it offline for two hours.  (The attackers demanded money, but Basecamp declined to pony up.)
  • Adobe’s Creativity outage, May 2014: One million-plus paying users of Adobe’s Creative Cloud and some secondary services were offline for 28 hours, thanks to a glitch during database maintenance activity.
  • Evernote outage, June 2014: DDoS attack took news aggregator Feedly and online note service Evernote offline for 10 hours. As with Basecamp, the perps demanded money but neither service complied.
  • Xen bug triggers AWS reboots, September 2014: A previously unreported bug in the Xen hypervisor caused three days of rolling reboots to apply patches to 10 percent of AWS’s servers.
  • Infinite Loop update brings down Azure, November 2014:  A bug, buried in code, froze the service for 12 hours, affecting Azure customers and various Microsoft properties, among them Office 365, Xbox Live and MSN.
  • Fewer fun and games during the holidays, December 2014:  With outages to the PlayStation Network and Xbox Live, brought on by hackers, the Grinch turned out to be an uninvited guest on Christmas morning.  (Sony can’t seem to catch a break.)

This is bad news for everyone – vendors and users, large organizations and small ones, solutions in the cloud and those outside of it.  For most users in most environments most of the time, things work pretty well.  But there’s a reason why solutions tend to work better within a certain strata of this multi-level environment we call the cloud than at the very top and the very bottom.

Providers in the middle ground, if I may be so bold, are better at paying attention.  The middle ground is that turf occupied by providers who operate neither from their garage nor from skyscrapers that extend to a thousand points unknown.  The mantra of the middle ground is simple: this is, above all, a service business.  Personalization matters.

As the litany above suggests, big doesn’t mean there’s an instant, automatic safety net.  Big doesn’t necessarily mean doing it right.  Big doesn’t mean that the provider has been able to fill all the open technical and support positions they need.  Cutting to the nub of things, users care most about who they’re going to speak with when an outage occurs – who’s going to listen and who will actually help. For most top-tier players, turnover is massive and continuous.  Try calling one of the larger providers and just see if you talk with the same rep twice.  Assuming you talk with anyone, that is.

It’s not a perfect analogy, but the cloud hosting business is like the car business, at least in this respect: the standard these days is fairly high.  Most cars perform admirably.  In same vein, most virtualization providers offer solid architecture and proven products.  We’re all using similar technology, and redundancy is (almost) a given.

But as the rundown in Resources attests, the provider organizations atop the food chain still have outages.  It’s not so much a matter of being outage-free; hardware failure happens, and software failure happens as well, redundancy notwithstanding.  The question is, when you have an outage, who you gonna call?

When the big guys fall, they fall hard.  Just ask Seth Rogen.

Adam Stern is founder and CEO of Infinitely Virtual (www.infinitelyvirtual.com) in Los Angeles.

Twitter- @iv_cloudhosting