- June 10, 2021
- Posted by: Jason Aten
- Category: Featured
- Cloud-computing platform Fastly was behind the outage that took down a lot of major sites.
- Tech columnist Jason Aten explains how the internet is at the whim of several tech companies.
- These providers have many benefits, but also one of them going out can take the whole internet down.
- See more stories on Insider’s business page.
On Monday morning, a series of major websites, including Reddit, The New York Times, Twitter, Github, Paypal, Amazon, and even the White House, experienced a significant outage.
Fastly, a cloud-computing platform that provides a variety of services, including a content delivery network (CDN), image optimization, and protection against denial of service (DDoS) attack, was behind the problem.
“We experienced a global outage due to an undiscovered software bug that surfaced on June 8 when it was triggered by a valid customer configuration change,” Nick Rockwell, Fastly’s senior vice president of engineering and infrastructure, said in a statement posted on the company’s blog. “We detected the disruption within one minute, then identified and isolated the cause, and disabled the configuration. Within 49 minutes, 95% of our network was operating as normal.”
Fastly managed to fix the issue relatively quickly, but it isn’t the first time a widespread outage was caused by an issue at just one company. Moreso, it highlights just how dependent the internet we use every day is on technology and companies that most people aren’t aware of at all.
While the internet is supposed to be decentralized, the reality is that it’s all rather fragile. That’s the result of a handful of providers – some of whom you’ve likely heard of, like Amazon Web Services, Google Cloud, and Microsoft, and others that most people haven’t, like Fastly, Cloudflare, and Akamai – establishing choke points that provide many benefits and make the modern web function, but also make it so that just one domino falling can take everything down.
CDNs like Fastly are just one component of what makes the internet easily accessible to so many people.
When any of those parts fail, the effect can be widespread and catastrophic. Making things more challenging is the fact that there are only a handful of major CDNs providing such a critical piece of the puzzle.
Content delivery networks help websites place content closer to users. Instead of hosting a website or service in one location, on one server, that information is cached on multiple servers across the world. Fastly says it has servers in 58 cities, which can handle 130 terabytes of data every second.
Distributing data across those servers helps reduce the amount of time it takes for sites to load, increasing site performance and making for a much better user experience. It also helps balance traffic across networks, reducing overall congestion, and creates redundancy, which – at least in theory – should prevent massive outages when an individual data center is offline.
In the case of Monday’s outage, Fastly hasn’t said exactly what went wrong beyond that it was related to a software bug that was triggered when a customer changed its configuration settings.
If you can get past the idea that a single customer could bring down so much of the internet just by making a settings change, it makes sense: The widespread effect of the outage means it was more likely related to software that was pushed across the company’s entire network, as opposed to an isolated issue with one data center.
The internet that we think of as ubiquitous is more like a massive Jenga tower supported by only a couple of blocks.
If you take a block out at the bottom, the whole thing comes tumbling down.
That’s basically what happened two years ago, when large parts of the internet went down twice in a week because Cloudflare, another one of those critical infrastructure providers, experienced a series of outages.
Certainly, companies like Fastly and Cloudflare go to extreme effort to protect and secure the cloud platforms they manage, but if all it takes is a simple software bug to bring down wide swaths of the internet, that makes them a highly valuable target to bad actors. It also highlights the risk to individuals and businesses who depend on the internet to just work.
The perfect example is Amazon: The ecommerce giant also runs the largest cloud-computing platform in the world, but its site experienced issues not because of its own servers, but because of an outage at a third-party.
Someone who wanted to inflict real economic damage on Amazon could do so without ever attacking its network of data centers. That’s a far bigger problem than just fixing a software bug.