homepage Welcome to WebmasterWorld Guest from 54.197.19.35
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque & physics

Webmaster General Forum

This 88 message thread spans 3 pages: 88 ( [1] 2 3 > >     
Theplanet's data centre in Houston catches fire
Planet's main data centre in Houston catches fire 1000s servers down
dusky




msg:3663980
 2:38 am on Jun 1, 2008 (gmt 0)

Hundreds if not thousands of servers are down due fire at one of theplanet's main data centres, main announcement at their forums here [forums.theplanet.com...] Apparently, no data was lost or damaged but I have the feeling that it's big and they are not sure!

The larger dedicated server provider in the world, I am sure millions of 404s every second from 100s of thousands of sites, this will undoubtedly be a test to see how quickly they can respond to the situation and get servers back online. it has been few hours so fare!

[newsblaze.com...]

[edited by: Brett_Tabke at 11:34 am (utc) on June 1, 2008]
[edit reason] added link [/edit]

 

dusky




msg:3663987
 2:51 am on Jun 1, 2008 (gmt 0)

I meant to say: The largest dedicated server provider in the world, I am sure millions of 404s every second from 100s of thousands of sites, this will undoubtedly be a test to see how quickly they can respond to the situation and get servers back online. it has been few hours so far!

MLHmptn




msg:3664041
 5:21 am on Jun 1, 2008 (gmt 0)

Yes, I am affected by this and my server has been down all day...More information as follows:

(complete text available at theplanet forums linked in the original post above)


This evening at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room.
...
This is a significant outage, impacting approximately 9,000 servers and 7,500 customers. All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday.
...

[edited by: MLHmptn at 5:23 am (utc) on June 1, 2008]

[edited by: phranque at 9:57 am (utc) on June 1, 2008]
[edit reason] edited for fair use [/edit]

dusky




msg:3664077
 6:49 am on Jun 1, 2008 (gmt 0)

You know what, this must be the day I read the equivalent of the entire Charles Dickens collection of forum posts and emails over the incident, what else to do when my whole network is down.
Imagine you just paid an alexa top 1000 site the fee for a mail blast to send their 100,000+ members with your message, message sent, thousands of traffic hits, but thousands of 404s!
In their defence, they have been updating everyone in four different ways, forums, control panel (orbit), email and chat server, so they have really been professional and quick to calm the 9000 server owners fears!

Oh, everyone if you are affected, don't forget to pause your PPC ads such as adwords etc.
warra, warra day!

maximillianos




msg:3664163
 11:57 am on Jun 1, 2008 (gmt 0)

Yikes... I know these kind of disasters are completely random and don't happen often, but kinda makes me think I should have a redundancy server with a separate provider... Perhaps in different parts of the country...

nonanet




msg:3664172
 12:28 pm on Jun 1, 2008 (gmt 0)

@dusky: I doubt your users will get 404's - they will just see a timeout. And i wonder how many of them will think "wow - that site must be popular now, since it's down"...

thegreatpretender




msg:3664177
 12:35 pm on Jun 1, 2008 (gmt 0)

My server is one of the affected which is hosting the site that most of the income is coming from. I am very satisfied with the service they provide but my next server will no longer be with them. This is to avoid having all website down at the same time when a disaster such as this happens again.

bwnbwn




msg:3664178
 12:39 pm on Jun 1, 2008 (gmt 0)

This will as well show those that have planned for such a disaster and have an emergency plan in place. This outage could be some time in getting fixed.

I have a plan in place right now but it is not adquate as I don't have complete access to the backup so...

I have set my mind to moving my main domain to another register and buying a cheap hosting package to set up an emergency plan that all I will need to do is change name servers on the domain.

vincevincevince




msg:3664180
 12:47 pm on Jun 1, 2008 (gmt 0)

This evening at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room.

Something, to me, does not sound right about that statement. Electrical gear can short, that's well known. Nothing in the electrical equipment room should explode at reasonable temperatures though. An explosion strong enough to knock down three walls, to my knowledge, only comes from a well established fire providing furnace-like conditions.

Luckily I am not a customer any more, however if I were I'd want to know why the increased load to the cooling system which must have preceded this for quite some time went ignored.

stormy




msg:3664199
 1:28 pm on Jun 1, 2008 (gmt 0)

I was unlucky/foolish enough to have both DNS servers on the same machine, so my sites are totally down (not timing out) and email is bounced back instantly (not queued).

We're lucky this happened over a Saturday night though!

pontifex




msg:3664205
 1:59 pm on Jun 1, 2008 (gmt 0)

pure luck, i am in the H2 datacenter with EVERYTHING :-/ that was close...!

innocbystr




msg:3664207
 2:08 pm on Jun 1, 2008 (gmt 0)

Wonder if this has anything to do with the "Rasberry Ants" which have "invaded" Houston and love electronics. If you haven't heard of them just do a search for them, you'll see what I mean....

jdMorgan




msg:3664209
 2:14 pm on Jun 1, 2008 (gmt 0)

An explosion strong enough to knock down three walls, to my knowledge, only comes from a well established fire providing furnace-like conditions.

... or a Halon fire-suppression system dumping a large tank into an enclosed, un-vented space. :)

I've seen that, and it's impressive!

I wish EV1 and their customers the best of luck with a speedy recovery!

Jim

Clark




msg:3664220
 2:50 pm on Jun 1, 2008 (gmt 0)

Wow! Just read this and found my sites are up. Whew!

pbradish




msg:3664229
 2:58 pm on Jun 1, 2008 (gmt 0)

Both of our E-Commerce websites are down, and have been since 5PM yesterday. Completely unacceptable.

cmendla




msg:3664230
 3:00 pm on Jun 1, 2008 (gmt 0)

If anyone is thinking "why not just move to another host" It isn't as simple as that. I've been working to establish an alternate host of a clients with an oscommerce store. It isn't a huge store with only about 2000 items and a couple of sales a day. OTOH, the sales are high ticket - about a grand each, and have a decent markup. He is not on a dedicated server. (Not quite making enough yet)

That being said, I wanted to establish an alternate host to be able to test some programming/db changes we want to make. It's turned out to be a lot more involved than I thought. I'm running into stuff like trying to get zend optimizer configured, getting into the nitty gritty of the existing oscommerce site and modding the config files and a lot of other boring stuff.

Once we get the alternate host working, we should be able to just switch the dns pointers and reload a backup. However, getting to that point is pretty involved. My guess is that trying to do something like this for dedicated servers might be even more inovlved.

I'm just glad that I'm not under pressure to get the second site working with the first one being down.

chris

carguy84




msg:3664254
 3:24 pm on Jun 1, 2008 (gmt 0)

I wish EV1 and their remaining customers the best of luck with a speedy recovery!

drall




msg:3664255
 3:26 pm on Jun 1, 2008 (gmt 0)

Try more like millions of sites are down because of this and then you will be in the right ballpark, including us:(

We have initiated our own dr plan and already have our mirror up and running on the other side of the country, just waiting on our downtime timeline to expire before we switch dns out.

Sadly many people have dns through them which means this is not a option as the core ev5 ev6 dns system is out as well effecting not only onsite servers but thousands of offsite servers as dns expires globally.

dusky




msg:3664258
 3:30 pm on Jun 1, 2008 (gmt 0)

On the plus side, apart from 000s of dollars worth of of losses, Google and other search engines will know about the incident and would ignore the timeouts encountered by their bots. Surely, there must be a minimum if 1+ million sites down right now. Google's "Theplanet update" or "Houston Update" in memory of any lost data and dollars would be a nice fit.

pbradish




msg:3664272
 4:06 pm on Jun 1, 2008 (gmt 0)

Let's hope so Dusky. I assume the same. GOOG, Y!, and MSN know about the outage and *should* not hold it against us.

amnesia440




msg:3664274
 4:14 pm on Jun 1, 2008 (gmt 0)

yikes. Good luck to those who are affected by this.

grelmar




msg:3664279
 4:36 pm on Jun 1, 2008 (gmt 0)

An explosion strong enough to knock down three walls, to my knowledge, only comes from a well established fire providing furnace-like conditions.

If the insulation on the cables had been smoldering and out gassing for a while, and then the fumes were lit by a short circuit, you could see a pretty significant "event".

Other good candidate for the explosion: batteries. If the fire was in their UPS room, depending on the chemical makeup of the batteries they were using, and the quantity, (most likely thousands of pounds worth of batteries), you could get anything from a small "whump" to something just short of Hiroshima. Lead acid batteries (like the one in your car) can detonate pretty spectacularly under the right conditions, and are very common in large data centers and telco switches (cheap and efficient), and if you've ever seen a li-ion laptop battery have a thermal runaway reaction [theinquirer.net], you'd prolly never rest a laptop on your crotch again.

Anyway, good luck to those affected. Hopefully they'll have backup servers online quickly, and nobody loses too much data.

edit add: youtube video [youtube.com] of laptop lion in thermal runaway

[edited by: grelmar at 4:54 pm (utc) on June 1, 2008]

g1smd




msg:3664281
 4:39 pm on Jun 1, 2008 (gmt 0)

You can vanish for 2 or 3 days with virtually no loss of position, even 7 days seems to have very little effect.

Beyond 10 to 14 days there are effects, but they seem to recover quite quickly.

Beyond 21 days... no idea.

pbradish




msg:3664286
 4:52 pm on Jun 1, 2008 (gmt 0)

if you've ever seen a li-ion laptop battery have a thermal runaway reaction, you'd prolly never rest a laptop on your crotch again.

Thanks to that picture, I will never rest a laptop there again... something about my manhood going up in flames. :)

pbradish




msg:3664299
 5:32 pm on Jun 1, 2008 (gmt 0)

To keep you up-to-date, here is the latest information about the outage in our H1 data center.

We expect to be able to provide initial power to parts of the H1 data center beginning at 5:00 p.m. CDT. At that time, we will begin testing and validating network and power systems, turning on air-conditioning systems and monitoring environmental conditions. We expect this testing to last approximately four hours.

Following this testing, we will begin to power-on customer servers in phases. These are approximate times, and as we know more, we will keep you apprised of the situation.

We will update you again around 2:30 p.m. this afternoon.

[forums.theplanet.com...]

bsterz




msg:3664336
 6:39 pm on Jun 1, 2008 (gmt 0)

I have many many sites hosted in H1. I also have many in the unaffected datacenters. I must say this isn't common with ThePlanet, but it sure hurts - a lot.

Bill

incrediBILL




msg:3664342
 6:47 pm on Jun 1, 2008 (gmt 0)

Both of our E-Commerce websites are down, and have been since 5PM yesterday. Completely unacceptable.

Never blame the host, most big hosts do a really good job at infrastructure but there's always the unexpected, including natural disasters, no matter how well you plan and design your data center.

Now you know you need a contingency plan.

It doesn't cost much to have a lesser powered backup server on standby with a carbon copy of your site ready to go live at a moments notice. It can even be kept up-to-date using a cron job with RSync (on Linux) so you've only lost an hours worth of data (orders) worse case. I also keep a daily backup locally just in case all of my best laid plans fail simultaneously, I can still upload to a new server elsewhere from scratch.

Besides, there are many other reasons to do this such as complete hardware failure, getting hacked, having major routers go down outside the data center, etc.

A few years ago a main router at Level 3 blew out in the Los Angeles region and it cut off a large chunk of the state and it took 6 hours to restore because there was only 1 backup part and it was a 3 hour drive away (and back) and for some odd reason there was nobody where the part was located that could drive it to LA, sounds stupid I know, but that's how big business operates.

So having a backup server in another part of the country is generally a good idea.

While you're at it, put your server on a 3rd party dynamic DNS service so you can move your site from server to server in mere seconds.

bsterz




msg:3664368
 7:49 pm on Jun 1, 2008 (gmt 0)

incrediBILL

How does dynamic DNS work? This always my issue - I need to make a move but propagation makes it difficult

bsterz




msg:3664390
 8:28 pm on Jun 1, 2008 (gmt 0)

FYI

One of my servers just came back online. It was down due to some sort of routing issue - so I think they are working from the edge inward.

DamonHD




msg:3664399
 8:45 pm on Jun 1, 2008 (gmt 0)

At the very least, if you have multiple income-earning sites, it is a good idea to spread them (and their DNS) between at least two independent hosting companies in locations geographically far apart. That need not cost any extra money or admin, just the same split between two or more sites.

In these IntarWeb days that's not so hard to do!

If one server or host or DNS server dies or makes some horrible technical SNAFU or backbone carriers are having another tiff, etc, you only lose a portion of your income.

And of course if you have smart DNS and front-ends and so on like (say) the BBC or Google do, then almost immediately all traffic can be diverted away from the failed parts and (almost) no users at all see any loss of service for more than at most a few minutes as determined by your DNS timeouts.

I have my relatively insignificant sites and DNS spread over 6 sites in 5 countries in the US, UK and AsiaPac, and am usually not too panicked when something fails (like my entire host in Atlanta losing power and UPS for many hours recently, or the disc failing in my main UK co-lo server). I can have cheaper hardware in each location too, since I'm relying on multiple machines to reduce the impact of any one failure. I can be ... ahem ... highly-strung, and this approach has probably been my biggest single investment in putting off cardiac failure! B^>

Rgds

Damon

This 88 message thread spans 3 pages: 88 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved