Forum Moderators: not2easy
Tweetdeck: Facebook is currently suffering a major outage which is impacting TweetDeck FB columns too. We suggest removing FB accounts until fixed.
[twitter.com...]
Facebook: Facebook may be slow or unavailable for some people because of site issues. We're working to fix this quickly.
[twitter.com...]
Where's Facebook's version of the Fail Whale?
BREAKING NEWS Facebook down. Worker productivity rises. US climbs out of recession.
Facebook likely disappointed millions of bored office workers again on Thursday afternoon with a widespread outage and latency, a day after an outage shut down the site for hours...
According to AlertSite, a Website performance service and vendor, Facebook only had 38 percent availability with 60 second response times.
Meanwhile, until service was restored, frustrated Facebook users overwhelmingly turned to micro-blogging site Twitter to tweet their unhappiness -- a slight irony due to the fact that Twitter itself was the recipient of a massive cross-site scripting attack that bombarded users with pop-ups, rainbow tweets and #*$!ography just two days prior.
[crn.com...]
The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn’t work when the persistent store is invalid.
Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.
One of Facebook's senior engineers Robert Johnson apologised to everyone who couldn't log on.
In a statement on his blog he said: "The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition.
"An automated system [to fix the problem] ended up causing more damage than it fixed."
An automated system for verifying configuration values ended up causing much more damage than it fixed.