Welcome to WebmasterWorld Guest from 54.159.19.75

Forum Moderators: phranque

Message Too Old, No Replies

Theplanet's data centre in Houston catches fire

Planet's main data centre in Houston catches fire 1000s servers down

     
2:38 am on Jun 1, 2008 (gmt 0)

Full Member

10+ Year Member

joined:Apr 28, 2005
posts: 221
votes: 0


Hundreds if not thousands of servers are down due fire at one of theplanet's main data centres, main announcement at their forums here [forums.theplanet.com...] Apparently, no data was lost or damaged but I have the feeling that it's big and they are not sure!

The larger dedicated server provider in the world, I am sure millions of 404s every second from 100s of thousands of sites, this will undoubtedly be a test to see how quickly they can respond to the situation and get servers back online. it has been few hours so fare!

[newsblaze.com...]

[edited by: Brett_Tabke at 11:34 am (utc) on June 1, 2008]
[edit reason] added link [/edit]

2:51 am on June 1, 2008 (gmt 0)

Full Member

10+ Year Member

joined:Apr 28, 2005
posts: 221
votes: 0


I meant to say: The largest dedicated server provider in the world, I am sure millions of 404s every second from 100s of thousands of sites, this will undoubtedly be a test to see how quickly they can respond to the situation and get servers back online. it has been few hours so far!
5:21 am on June 1, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 16, 2003
posts:525
votes: 0


Yes, I am affected by this and my server has been down all day...More information as follows:

(complete text available at theplanet forums linked in the original post above)


This evening at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room.
...
This is a significant outage, impacting approximately 9,000 servers and 7,500 customers. All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday.
...

[edited by: MLHmptn at 5:23 am (utc) on June 1, 2008]

[edited by: phranque at 9:57 am (utc) on June 1, 2008]
[edit reason] edited for fair use [/edit]

6:49 am on June 1, 2008 (gmt 0)

Full Member

10+ Year Member

joined:Apr 28, 2005
posts: 221
votes: 0


You know what, this must be the day I read the equivalent of the entire Charles Dickens collection of forum posts and emails over the incident, what else to do when my whole network is down.
Imagine you just paid an alexa top 1000 site the fee for a mail blast to send their 100,000+ members with your message, message sent, thousands of traffic hits, but thousands of 404s!
In their defence, they have been updating everyone in four different ways, forums, control panel (orbit), email and chat server, so they have really been professional and quick to calm the 9000 server owners fears!

Oh, everyone if you are affected, don't forget to pause your PPC ads such as adwords etc.
warra, warra day!

11:57 am on June 1, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 20, 2004
posts:2377
votes: 0


Yikes... I know these kind of disasters are completely random and don't happen often, but kinda makes me think I should have a redundancy server with a separate provider... Perhaps in different parts of the country...
12:28 pm on June 1, 2008 (gmt 0)

New User

10+ Year Member

joined:Feb 8, 2005
posts:18
votes: 0


@dusky: I doubt your users will get 404's - they will just see a timeout. And i wonder how many of them will think "wow - that site must be popular now, since it's down"...
12:35 pm on June 1, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 12, 2004
posts:550
votes: 0


My server is one of the affected which is hosting the site that most of the income is coming from. I am very satisfied with the service they provide but my next server will no longer be with them. This is to avoid having all website down at the same time when a disaster such as this happens again.
12:39 pm on June 1, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 25, 2005
posts:3492
votes: 3


This will as well show those that have planned for such a disaster and have an emergency plan in place. This outage could be some time in getting fixed.

I have a plan in place right now but it is not adquate as I don't have complete access to the backup so...

I have set my mind to moving my main domain to another register and buying a cheap hosting package to set up an emergency plan that all I will need to do is change name servers on the domain.

12:47 pm on June 1, 2008 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
posts:4847
votes: 0


This evening at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room.

Something, to me, does not sound right about that statement. Electrical gear can short, that's well known. Nothing in the electrical equipment room should explode at reasonable temperatures though. An explosion strong enough to knock down three walls, to my knowledge, only comes from a well established fire providing furnace-like conditions.

Luckily I am not a customer any more, however if I were I'd want to know why the increased load to the cooling system which must have preceded this for quite some time went ignored.

1:28 pm on June 1, 2008 (gmt 0)

Full Member

10+ Year Member

joined:Dec 9, 2002
posts:234
votes: 0


I was unlucky/foolish enough to have both DNS servers on the same machine, so my sites are totally down (not timing out) and email is bounced back instantly (not queued).

We're lucky this happened over a Saturday night though!

1:59 pm on June 1, 2008 (gmt 0)

Senior Member from DE 

WebmasterWorld Senior Member 10+ Year Member

joined:May 25, 2002
posts:926
votes: 0


pure luck, i am in the H2 datacenter with EVERYTHING :-/ that was close...!
2:08 pm on June 1, 2008 (gmt 0)

Junior Member

10+ Year Member

joined:July 10, 2005
posts: 105
votes: 0


Wonder if this has anything to do with the "Rasberry Ants" which have "invaded" Houston and love electronics. If you haven't heard of them just do a search for them, you'll see what I mean....
2:14 pm on June 1, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


An explosion strong enough to knock down three walls, to my knowledge, only comes from a well established fire providing furnace-like conditions.

... or a Halon fire-suppression system dumping a large tank into an enclosed, un-vented space. :)

I've seen that, and it's impressive!

I wish EV1 and their customers the best of luck with a speedy recovery!

Jim

2:50 pm on June 1, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 8, 2002
posts:2335
votes: 0


Wow! Just read this and found my sites are up. Whew!
2:58 pm on June 1, 2008 (gmt 0)

Preferred Member from US 

5+ Year Member

joined:Sept 21, 2006
posts:358
votes: 0


Both of our E-Commerce websites are down, and have been since 5PM yesterday. Completely unacceptable.
3:00 pm on June 1, 2008 (gmt 0)

Preferred Member from US 

10+ Year Member

joined:May 6, 2004
posts:650
votes: 0


If anyone is thinking "why not just move to another host" It isn't as simple as that. I've been working to establish an alternate host of a clients with an oscommerce store. It isn't a huge store with only about 2000 items and a couple of sales a day. OTOH, the sales are high ticket - about a grand each, and have a decent markup. He is not on a dedicated server. (Not quite making enough yet)

That being said, I wanted to establish an alternate host to be able to test some programming/db changes we want to make. It's turned out to be a lot more involved than I thought. I'm running into stuff like trying to get zend optimizer configured, getting into the nitty gritty of the existing oscommerce site and modding the config files and a lot of other boring stuff.

Once we get the alternate host working, we should be able to just switch the dns pointers and reload a backup. However, getting to that point is pretty involved. My guess is that trying to do something like this for dedicated servers might be even more inovlved.

I'm just glad that I'm not under pressure to get the second site working with the first one being down.

chris

3:24 pm on June 1, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 13, 2005
posts:1077
votes: 0


I wish EV1 and their remaining customers the best of luck with a speedy recovery!
3:26 pm on June 1, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 16, 2004
posts:854
votes: 0


Try more like millions of sites are down because of this and then you will be in the right ballpark, including us:(

We have initiated our own dr plan and already have our mirror up and running on the other side of the country, just waiting on our downtime timeline to expire before we switch dns out.

Sadly many people have dns through them which means this is not a option as the core ev5 ev6 dns system is out as well effecting not only onsite servers but thousands of offsite servers as dns expires globally.

3:30 pm on June 1, 2008 (gmt 0)

Full Member

10+ Year Member

joined:Apr 28, 2005
posts: 221
votes: 0


On the plus side, apart from 000s of dollars worth of of losses, Google and other search engines will know about the incident and would ignore the timeouts encountered by their bots. Surely, there must be a minimum if 1+ million sites down right now. Google's "Theplanet update" or "Houston Update" in memory of any lost data and dollars would be a nice fit.
4:06 pm on June 1, 2008 (gmt 0)

Preferred Member from US 

5+ Year Member

joined:Sept 21, 2006
posts:358
votes: 0


Let's hope so Dusky. I assume the same. GOOG, Y!, and MSN know about the outage and *should* not hold it against us.
4:14 pm on June 1, 2008 (gmt 0)

New User

5+ Year Member

joined:May 31, 2008
posts:18
votes: 0


yikes. Good luck to those who are affected by this.
4:36 pm on June 1, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:683
votes: 0


An explosion strong enough to knock down three walls, to my knowledge, only comes from a well established fire providing furnace-like conditions.

If the insulation on the cables had been smoldering and out gassing for a while, and then the fumes were lit by a short circuit, you could see a pretty significant "event".

Other good candidate for the explosion: batteries. If the fire was in their UPS room, depending on the chemical makeup of the batteries they were using, and the quantity, (most likely thousands of pounds worth of batteries), you could get anything from a small "whump" to something just short of Hiroshima. Lead acid batteries (like the one in your car) can detonate pretty spectacularly under the right conditions, and are very common in large data centers and telco switches (cheap and efficient), and if you've ever seen a li-ion laptop battery have a thermal runaway reaction [theinquirer.net], you'd prolly never rest a laptop on your crotch again.

Anyway, good luck to those affected. Hopefully they'll have backup servers online quickly, and nobody loses too much data.

edit add: youtube video [youtube.com] of laptop lion in thermal runaway

[edited by: grelmar at 4:54 pm (utc) on June 1, 2008]

4:39 pm on June 1, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


You can vanish for 2 or 3 days with virtually no loss of position, even 7 days seems to have very little effect.

Beyond 10 to 14 days there are effects, but they seem to recover quite quickly.

Beyond 21 days... no idea.

4:52 pm on June 1, 2008 (gmt 0)

Preferred Member from US 

5+ Year Member

joined:Sept 21, 2006
posts:358
votes: 0


if you've ever seen a li-ion laptop battery have a thermal runaway reaction, you'd prolly never rest a laptop on your crotch again.

Thanks to that picture, I will never rest a laptop there again... something about my manhood going up in flames. :)

5:32 pm on June 1, 2008 (gmt 0)

Preferred Member from US 

5+ Year Member

joined:Sept 21, 2006
posts:358
votes: 0


To keep you up-to-date, here is the latest information about the outage in our H1 data center.

We expect to be able to provide initial power to parts of the H1 data center beginning at 5:00 p.m. CDT. At that time, we will begin testing and validating network and power systems, turning on air-conditioning systems and monitoring environmental conditions. We expect this testing to last approximately four hours.

Following this testing, we will begin to power-on customer servers in phases. These are approximate times, and as we know more, we will keep you apprised of the situation.

We will update you again around 2:30 p.m. this afternoon.

[forums.theplanet.com...]

6:39 pm on June 1, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:June 20, 2003
posts:558
votes: 0


I have many many sites hosted in H1. I also have many in the unaffected datacenters. I must say this isn't common with ThePlanet, but it sure hurts - a lot.

Bill

6:47 pm on June 1, 2008 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14622
votes: 88


Both of our E-Commerce websites are down, and have been since 5PM yesterday. Completely unacceptable.

Never blame the host, most big hosts do a really good job at infrastructure but there's always the unexpected, including natural disasters, no matter how well you plan and design your data center.

Now you know you need a contingency plan.

It doesn't cost much to have a lesser powered backup server on standby with a carbon copy of your site ready to go live at a moments notice. It can even be kept up-to-date using a cron job with RSync (on Linux) so you've only lost an hours worth of data (orders) worse case. I also keep a daily backup locally just in case all of my best laid plans fail simultaneously, I can still upload to a new server elsewhere from scratch.

Besides, there are many other reasons to do this such as complete hardware failure, getting hacked, having major routers go down outside the data center, etc.

A few years ago a main router at Level 3 blew out in the Los Angeles region and it cut off a large chunk of the state and it took 6 hours to restore because there was only 1 backup part and it was a 3 hour drive away (and back) and for some odd reason there was nobody where the part was located that could drive it to LA, sounds stupid I know, but that's how big business operates.

So having a backup server in another part of the country is generally a good idea.

While you're at it, put your server on a 3rd party dynamic DNS service so you can move your site from server to server in mere seconds.

7:49 pm on June 1, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:June 20, 2003
posts:558
votes: 0


incrediBILL

How does dynamic DNS work? This always my issue - I need to make a move but propagation makes it difficult

8:28 pm on June 1, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:June 20, 2003
posts:558
votes: 0


FYI

One of my servers just came back online. It was down due to some sort of routing issue - so I think they are working from the edge inward.

8:45 pm on June 1, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 25, 2004
posts:2156
votes: 0


At the very least, if you have multiple income-earning sites, it is a good idea to spread them (and their DNS) between at least two independent hosting companies in locations geographically far apart. That need not cost any extra money or admin, just the same split between two or more sites.

In these IntarWeb days that's not so hard to do!

If one server or host or DNS server dies or makes some horrible technical SNAFU or backbone carriers are having another tiff, etc, you only lose a portion of your income.

And of course if you have smart DNS and front-ends and so on like (say) the BBC or Google do, then almost immediately all traffic can be diverted away from the failed parts and (almost) no users at all see any loss of service for more than at most a few minutes as determined by your DNS timeouts.

I have my relatively insignificant sites and DNS spread over 6 sites in 5 countries in the US, UK and AsiaPac, and am usually not too panicked when something fails (like my entire host in Atlanta losing power and UPS for many hours recently, or the disc failing in my main UK co-lo server). I can have cheaper hardware in each location too, since I'm relying on multiple machines to reduce the impact of any one failure. I can be ... ahem ... highly-strung, and this approach has probably been my biggest single investment in putting off cardiac failure! B^>

Rgds

Damon

This 88 message thread spans 3 pages: 88