Welcome to WebmasterWorld Guest from 54.226.194.180

Forum Moderators: rogerd

Message Too Old, No Replies

Foursquare Explains 11 Hours Downtime

     

engine

9:23 am on Oct 6, 2010 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Foursquare Explains 11 Hours Downtime [blog.foursquare.com]
Yesterday, we experienced a very long downtime. All told, we were down for about 11 hours, which is unacceptably long. It sucked for everyone (including our team – we all check in everyday, too). We know how frustrating this was for all of you because many of you told us how much you’ve come to rely on foursquare when you’re out and about. For the 32 of us working here, that’s quite humbling. We’re really sorry.

This blog post is a bit technical. It has the details of what happened, and what we’re doing to make sure it doesn’t happen again in the future.

What happened
The vast bulk of the data we store is from user check-in histories. The way our databases are structured is that that data is spread evenly across multiple database “shards”, each of which can only store so many check-ins. Starting around 11:00am EST yesterday, we noticed that one of these shards was performing poorly because a disproportionate share of check-ins were being written to it. For the next hour and a half, until about 12:30pm, we tried various measures to ensure a proper load balance. None of these things worked. As a next step, we introduced a new shard, intending to move some of the data from the overloaded shard to this new one.

rogerd

7:34 pm on Oct 20, 2010 (gmt 0)

WebmasterWorld Administrator rogerd is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Funny, the world continued to function normally during this outage...
 

Featured Threads

Hot Threads This Week

Hot Threads This Month