Skip to main content

Amazon thinks it knows what caused last week's AWS outage

AWS re:Invent 2021 sign
(Image credit: Future / Mike Moore)

Amazon has revealed its findings into the cause of the recent AWS outage that affected websites and users around the world.

A wide range of Amazon services such as Prime Video, Alexa and Ring, alongside high-level customers such as Facebook and Disney Plus, all saw downtime or significant slowdowns due to an issue in an AWS US region that lasted for many hours.

The company has now completed its investigation into the outage, which it says was down to an unexpected series of events initally aimed at boosting its services.

AWS outage

“An automated activity to scale capacity of one of the AWS services hosted in the main AWS network triggered an unexpected behavior from a large number of clients inside the internal network,” AWS wrote in a blog post

"This resulted in a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks."

The company says it has now taken "several actions" to prevent a recurrence of this event, including further evaluation of such scaling activities and deploying additional network configuration.

AWS has also pledged to overhaul how it tracks and provides information on outages going forward, noting that, "We understand that events like this are more impactful and frustrating when information about what’s happening isn’t readily available."

The company sas it will now deploy "several enhancements" to its Support Services to ensure it is able to quickly communicate any future issues with customers, with an upgrade set to deploy in early 2022. 

"Finally, we want to apologize for the impact this event caused for our customers," the blog concluded. "While we are proud of our track record of availability, we know how critical our services are to our customers, their applications and end users, and their businesses. We know this event impacted many customers in significant ways. We will do everything we can to learn from this event and use it to improve our availability even further."

Mike Moore is News & Features Editor across both TechRadar Pro and ITProPortal. He has worked as a B2B and B2C tech journalist for nearly a decade, including at one of the UK's leading national newspapers, and when he's not keeping track of all the latest enterprise and workplace trends, can most likely be found watching, following or taking part in some kind of sport.