Archive for June, 2008

Amazon outages- Plausible causes

Tuesday, June 10th, 2008

by Supranamaya Ranjan

The intermittent outages at Amazon on Friday (June 6) and Monday (June 9) along with the fact that we have no official word from Amazon about the reasons, are leading up to an exciting Internet potboiler of sorts. While we still don’t know the reasons for these outages, we can eliminate the following reasons:

  • Using NarusInsight Secure Suite (NSS), we haven’t found any instances of a large scale network-initiated attack so far that could have led to these outages.

We did detect a Denial-of-Service (DoS) attack against the Internet Movie Database (IMDB), which is owned by Amazon. This attack lasted for about 2 hours starting at 9:52 am PST, which coincided with the downtime of Amazon and its affiliate sites. The attack volume averaged 3 Mbits/sec and it was a sophisticated layer-7 DoS attack where the attacker opened 500+ HTTP sessions in an attempt to stress out the CPU resources or clobber the bandwidth around IMDB. However, the attack volume as seen by NSS probes does not seem large enough to warrant an outage as big as this on IMDB or Amazon. At this point, the attack looks coincidental with the outage and not the cause for it.

  • Similarly, NSS shows that Amazon prefixes weren’t hijacked by anyone and so we can eliminate Prefix Hijacking as the cause (read Prefix Hijacking of YouTube that happened earlier this year).
  • A traceroute to Amazon’s prefixes doesn’t reveal any malicious Autonomous Systems (ASes) as trying to re-route the traffic in order to steal it, in what is known as a Path Hijacking attack. Note that in such an attack, an attacker injects himself in to the BGP AS PATH.
  • Amazon’s DNS entries are also pointing to the right ip-addresses for their web servers, so we can eliminate DNS cache poisoning or other DNS related attacks.

The most plausible cause appears to be errors in their Content Distribution Network or with their load balancing. In their normal operation, CDNs are supposed to direct a user to the “best” web server that could serve them the content the fastest. Best could be defined as either the server that is the closest to the user (using metrics of either smallest number of hops or least round trip-times) or one which has the smallest workload currently.

However, yesterday I was always returned the slowest web server’s ip address while trying to resolve amazon.com via DNS. For instance, yesterday (June 9th), I was always returned the following address (http://72.21.210.11) which either took forever to load on my browser or when it came back, the page had no images. On the other hand, the other two ip-addresses that most likely point to different web servers (http://72.21.206.5/ and http://72.21.203.1/) would load much faster and normally.

So the most plausible cause for these outages seems to be either (i) CDN related where users are being returned sub-optimal web server ip-addresses or; (ii) An internal load balancing issue affecting their data centers that is affecting the rendering of response pages from the fragments such as images, text and dynamic queries or; (iii) Amazon is re-architecting their data centers to accomodate new service offerings.

What the truth is, only time or Amazon can tell.

Amazon down today, took IMDB down along with it

Friday, June 6th, 2008

by Supranamaya Ranjan

Amazon suffered an outage today starting 10:30 am PST. For a few hours the main page of Amazon seemed inaccessible and users would get an error message ‘HTTP/1.1 Service Not Available’. There are reports though that users are now able to access the site since 1:30 pm PST.

Using NarusInsight Secure Suite, we are continuing to investigate whether this outage was a result of a network-initiated attack against Amazon. Preliminary analysis doesn’t suggest any Distributed Denial-of-Service (DDoS) attack or any other foul play against the main web site.

Contrary to emerging reports that sites that use Amazon Web Services (AWS) do seem to be running well, we’ve seen that IMDB (Internet Movie DataBase) does appear to have been affected by the outage. My preliminary analysis using NarusInsight Secure Suite shows that at least one of the ip-addresses used to host IMDB was under a sustained denial-of-service attack. My attempt to load the IMDB page via a direct connection to the web server under attack (http://72.21.206.70/) doesn’t load the images at all. It becomes interesting when you realize that IMDB seems to be hosted using Amazon Web Service (AWS) since this ip-address is registered as belonging to Amazon.

Stay tuned, as we would continue with the forensics and be back with more details.

[Update on the IMDB attack at 3:30 pm PST June 06 2008]

This attack coincided with the downtime for Amazon, beginning at 10:30 PDT and continued for about 1 hour and 10 minutes. The attack itself was interesting in that the attacker seemed to open multiple connections with the IMDB’s web server (port 80) while incrementing his source port for every new connection. The attack’s average rate was 3 Mbits/sec, certainly not large enough to cause a complete meltdown but probably good enough to delay the legit users. However, there might have been other attacks launched at the same time on IMDB which weren’t in the path of our probes. If any one else has heard anything else about IMDB’s outage, please comment.

[Update on the Amazon outage at 10:30 am PST June 09 2008]

Seems that Amazon.com suffered another outage Monday June 9th morning. This time the access problem is intermittent. I was unable to access one of the  web server ip-addresses at all directly (http://72.21.210.11) while the other two (http://72.21.206.5/  and http://72.21.203.1/) do display content normally.