What Really Caused Skype’s Worldwide Outage?
UPDATE 3-1-2011: Skype eventually posted a detailed explanation here, which satisfied most of my previous doubts. This post has been kept as a historical reference.
Millions of people around the world use Skype to make cheap phone calls over the Internet. That’s why when their service went down for two days last week, it made headlines all over the world. The outage was reported to have started on Wedneday, 22nd December at 9am PST (Thursday 23rd December 4am here in Sydney).
I am also a Skype user and both my Windows client and Android client on my smart phone reported my status as being offline, with none of my regular contacts visible. It went back to normal for me when I logged in on Friday morning.
Skype’s Official Explanation
On Skype’s blog, their official announcement stated:
“Under normal circumstances, there are a large number of supernodes available. Unfortunately, today, many of them were taken offline by a problem affecting some versions of Skype. As Skype relies on being able to maintain contact with supernodes, it may appear offline for some of you.”
What is a Supernode?
For those that don’t know what a supernode is, any computer that is running the Skype client can be automatically appointed a supernode, in which it acts like a hub and helps other nearby Skype users to find each other. The reason for this is due to the fact that some of Skype’s functionality is decentralised, using a Peer-to-Peer (P2P) topology. so having supernodes improves the efficiency of network communication. Computers with broadband connections that are not behind firewalls are likely to be chosen to act as supernodes.
Why I don’t believe Skype’s Official Explanation
I do not believe Skype’s explanation for the following reasons:
- As far as I know, all users were affected around the world, not just “some”
- If the problem was limited to a single defective version, you would only expect some contacts to be unreachable, not all of them
- If the problem only affected some versions of Skype, all the other clients running the good versions would still be able to operate as supernodes, allowing Skype to function
- Why would the problem hit so many systems at once, at the same time? Surely, they are not implying that millions of people around the world suddenly switched to defective versions! Something must have triggered it
My Alternative Theory
If I were to make a guess at the cause, it looks more like a carefully planned Denial-Of-Service attack. A malicious person may have found a weakness in Skype’s protocol or software, obtaining a list of IP addresses of all supernodes who were on the network at that time, and then sending those supernodes specially crafted data to cause a software failure.
Perhaps their protocol has little or no redundancy to protect against supernode malfunction, i.e. all the skype users known to a single supernode will be lost to the rest of the network if that single supernode fails. This would not happen if the system were designed so that each user is registered with two or more supernodes.
The motivation would be to cause Skype embarrassment at Christmas time, when many people are making long-distance phone calls to relatives, friends and business associates. The perpetrator could be a business rival, online extortionist, or simply some troublemaker with nothing better to do with their time. It is known that Skype is trying to attract large corporations as clients, and an outage of this scale would really cause people to lose confidence in the reliability of their platform.
This is just speculation on my part. I have no direct knowledge of the Skype protocol, and I have no evidence of foul play, but I sure as hell do not believe Skype’s explanation.