Gibson Index

2013-03-03: CloudFlare Service Outage

Level One Event

The recent CloudFlare service outage was caused only tangentially by a DDoS. In attempting to mitigate the DDoS, a routing rule mishap caused all of their routers to stop responding properly. This affected a large number of customers, but was resolved quickly and analyized thoroughly to find ways of preventing it in the future.

CloudFlare says that the 30-minute outage began as a well-intentioned effort to head off a DDoS against one of their customers' DNS servers. The errant rule, which was intended to detect oversized packets that seemed typical of the observed DDoS event, went a bit haywire and caused all of their routers to stop communicating properly. In effect, the normal failover rules may have acted to worsen the incident as the errant rule spread to the other nodes and consumed all of their available memory.


comments powered by Disqus