Where you are, and where you’re going to #
The internet is HUGE, reaching all over the globe and outer space. It was designed to be a fault tolerant network, where a whole route could disappear and network traffic could still route around to its final destination dynamically. Today, if a giant underwater ocean line is cut (thanks Russia) your connection just routes around using another line and still gets you were you’re going. This all, in theory, means that unless your final destination happens to be in the exact area the route went down the internet is an always on, always functioning machine.
When it comes to fault tolerance, this gets baked into the infrastructure inside of co-locations as well. Networks with redundant off-site DNS, load-balancers, primary and secondary servers and databases, etc..
When you consider everything involved it is easy to sit back with a misplaced sense of confidence that everything is going to stay up and running in some form or fashion.
Enter today’s Amazon Web Services (AWS) issues #
… and just like that, a cascading DNS issue at Amazon sends half the Internet into a tailspin.
The problem the modern Internet has, is not with redundancy in routing, but with redundancy in providers. Every major outage from AWS, to Azure has shown one truth, and that is that a consolidation in providers to several big players may be great for efficiency and scalability, but it is horrible for redundancy.
It doesn’t matter how redundant routes are or infrastructure is if everyone, by in large, is going to the same place. If you have a systemic issue at that place, everyone is hosed. It doesn’t matter how distributed the system is overall, it matters how distributed the data and services that everyone wants are.
Major Provider Outages #
Here are just some of the major outages in the last 5 years and their impacts:
- 2025 (today) - AWS Global Outage - Interruptions included Disney+, Reddit, Ring, Snapchat, Venmo, Ansa, AT&T, T-Mobile and Verizon.
- 2024 - CrowdStrike - Global crashes on Windows devices affecting airlines, banks, hospitals, and others.
- 2024 - Cloudflare outage interruptions to HubSpot, Zoom and Shopify.
- 2022 - Southwest Airlines - Global travel disruption.
- 2021 - Akamai and Fastly - Both starting with cascading DNS issues taking down large swaths of the internet.
- 2021 - Meta - DNS and BGP configuration errors took down Facebook, WhatsApp, Instagram and Oculus for more than 6 hours.
Dallas Infomart #
Network Chuck did a recent video on the Dallas Infomart, where an extremely large amount of data and information live and pass through.
Consolidation also = Single Point of Failure #
If I take away nothing else from the AWS outage and some of the other major outages of the last several years, it’s that while the internet may be highly distributed, the valuable data and resources on it are highly concentrated. With a push into even larger and larger AI data centers I imagine this trend will continue.
There are several things we all should be taking into consideration as a result. But chief amongst those is thinking about what happens with a total cascading failure at one or more of these locations?
- What would happen if a coordinated cyber attack took out AWS and Azure?
- What would happen if a nuclear or other attack took out our top infrastructure?
- What would happen if a major service, such as DNS, or a major operating system, such as Windows 11, was exploited?
As time has gone on, more and more of everything that makes up our society has moved to the Internet. And to some degree, this makes the Internet itself as a potential single point of failure for civilization.
Every time a major outage such as the one in AWS today occurs, I’m both awed by the amount of impact to the world the outage produces, and also I am thankful that – at least for now – these have all been stupid, reversable configuration issues.
EDIT: Oct. 22 2025 #
The code report comes in hot with a decent high level technical breakdown of the AWS Event: