Forum Moderators: phranque

Message Too Old, No Replies

Theplanet's data centre in Houston catches fire

Planet's main data centre in Houston catches fire 1000s servers down

         

dusky

2:38 am on Jun 1, 2008 (gmt 0)

10+ Year Member



Hundreds if not thousands of servers are down due fire at one of theplanet's main data centres, main announcement at their forums here [forums.theplanet.com...] Apparently, no data was lost or damaged but I have the feeling that it's big and they are not sure!

The larger dedicated server provider in the world, I am sure millions of 404s every second from 100s of thousands of sites, this will undoubtedly be a test to see how quickly they can respond to the situation and get servers back online. it has been few hours so fare!

[newsblaze.com...]

[edited by: Brett_Tabke at 11:34 am (utc) on June 1, 2008]
[edit reason] added link [/edit]

robzilla

9:12 pm on Jun 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Looks like the ThePlanet forums can't take the load. Last time I checked, there were over 2500 people watching that thread. Can't get to the Newsblaze article either. Not a good day for the WWW :-P

creeking

10:19 pm on Jun 1, 2008 (gmt 0)

10+ Year Member



I was unlucky/foolish enough to have both DNS servers on the same machine, so my sites are totally down (not timing out) and email is bounced back instantly (not queued).

consider adding a free dns nameserver to your nameserver list (at the domain registrar). sign up for free dns and point it to the correct IP, so when things come back online, the nameserver will work the way you want it to.

T_Miller

10:22 pm on Jun 1, 2008 (gmt 0)

10+ Year Member



My main ecommerce site is back online now (6:25 PM EST), but I got about 40 other sites (personal, hobby stuff, clients, etc) that are still down...

They seem to be getting a handle on it...

pontifex

10:48 pm on Jun 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was happy too early... i missed the fact, that our nameserver entries in the linux boxes had EV1 legacy entries. Now most of our invoice mails failed. It took me about 3 hours to filter, rephrase and got them mailed. What a mess!

It also seems that the outgoing traffic had serious package losses to some parts of the world. Mail delivery connects to norway took over 10 seconds, so our mail delivery timed out.

All in all: nothing serious, just annoying for us, but some additional fail-over setup seems to be wise!
P!

swa66

10:49 pm on Jun 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually I'm seriously considering using this incident as a reason to become a new customer of the Planet. They now have more experience with BCP/DRP than the competition. And they will have continued attention from upper MGMT to get this right for many years to come.

pbradish

12:14 am on Jun 2, 2008 (gmt 0)

10+ Year Member



Never blame the host, most big hosts do a really good job at infrastructure but there's always the unexpected, including natural disasters, no matter how well you plan and design your data center.

Now you know you need a contingency plan.

It doesn't cost much to have a lesser powered backup server on standby with a carbon copy of your site ready to go live at a moments notice. It can even be kept up-to-date using a cron job with RSync (on Linux) so you've only lost an hours worth of data (orders) worse case. I also keep a daily backup locally just in case all of my best laid plans fail simultaneously, I can still upload to a new server elsewhere from scratch.

Besides, there are many other reasons to do this such as complete hardware failure, getting hacked, having major routers go down outside the data center, etc.

A few years ago a main router at Level 3 blew out in the Los Angeles region and it cut off a large chunk of the state and it took 6 hours to restore because there was only 1 backup part and it was a 3 hour drive away (and back) and for some odd reason there was nobody where the part was located that could drive it to LA, sounds stupid I know, but that's how big business operates.

So having a backup server in another part of the country is generally a good idea.

While you're at it, put your server on a 3rd party dynamic DNS service so you can move your site from server to server in mere seconds.

All great points IncrediBILL. I'm taking this all in as a learning experience. We were being very naive to not have a rock-solid disaster plan in place.

incrediBILL

2:09 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



it is a good idea to spread them (and their DNS) between at least two independent hosting companies in locations geographically far apart.

Most people don't realize it but the DNS server is the single point of weakness in most data centers and it was pointed out quite dramatically by a botnet attack against a security company a few years back. The botnet couldn't bring the site down with all their security so they stopped attacking the site itself and attacked the hosts DNS servers with a flood of traffic instead and brought EVERYONE being hosted there down, including the intended target.

So more than one DNS server from more than one source is a really good idea.

mcneely

2:25 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In so many ways, I'm sorry for The Planet, but in a few other ways I'm not. Seems over the past few years, The Planet has gone from #7 to #3 on our list of major spam providers in the U.S.

There's always going to be a worse case, and it appears that The Planet has provided the many with one.

Ours has always been to colo or lease across the country. With switching at a premium these days, it's still felt that it's worth it to stay up, "always". You can never have too much redundancy, especially when dealing with network wide outages, or even outages across multiple networks.
One might go offline in Virginia, and at the second it does, L.A. picks it up without so much as even a blink.

Providing yourself with a backup, or some other kind of redundancy plan is a very good idea. Don't throw it all into just one DC or Network if you can help it. Spread your data out a bit.

Hawkgirl

2:55 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If banging your head on the keyboard repeatedly could help speed things up, I would personally have that place up and running again already. I swear I'm always affected by this kind of thing.

I have been on both sides of the planning spectrum - from a ridiculously complicated backup plan involving such steps as keeping three backup servers with as near-time data as possible and then hand-transporting them separately to three previously-chosen locations to get things up and running again (I swear) to NO plan (like, uh, right now with my personal site). Somewhere in the middle would probably be a good idea.

I'm still not up, and I am about to end up whimpering incoherently in the fetal position near my computer.

Keyboard, meet forehead.

BananaFish

3:11 am on Jun 2, 2008 (gmt 0)

10+ Year Member



Crazy Rasberry Ants... and you wonder why risk management is so important.

dusky

3:14 am on Jun 2, 2008 (gmt 0)

10+ Year Member



"I would personally have that place up and running again already"
Why go for harder task, you could've just had a redundant server as backup somewhere else as expected from experts, and from the sound of it, you have demonstrated a notion of that.
Don't take this personal, we are all in the same boat, err datacenter I mean!

dusky

3:36 am on Jun 2, 2008 (gmt 0)

10+ Year Member



Considering an estimate of +- 1 million sites down for 36+ hours, collateral damage can run into Billions of dollars, from lost earnings, opportunity costs, man-hour wages, insurance and compensations, temp loss of SE ranking, loss of confidence and repeat custom, wasted paid ad space..etc. All that and no one to blame but ourselves in a way, you can't blame it on the TP as it is almost an "Act of God" as it were.
One thing gained, a good lesson!

Andem

5:39 am on Jun 2, 2008 (gmt 0)

10+ Year Member Top Contributors Of The Month



I now have a list of things that will change after this. One of them being having a backup server ready to go on the drop of a hat. The next is setting up a 3rd and 4th DNS server located elsewhere in the country. Actually they're being setup right now.

I'm disappointed ThePlanet has decided to let things happen like this. There was most likely overloading in the original RackShack/EV1Servers datacentre in Houston.. I feel they've been giving little attention to the legacy customers and datacentre.

incrediBILL

5:59 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm disappointed ThePlanet has decided to let things happen like this.

You think someone just woke up and said "Hey! Wouldn't it be fun to blow up a transformer today?"

I kind of doubt that.

Andem

7:00 am on Jun 2, 2008 (gmt 0)

10+ Year Member Top Contributors Of The Month



You think someone just woke up and said "Hey! Wouldn't it be fun to blow up a transformer today?"

No, of course not.

I've just been let down since EV1 ceased to exist. If there was proper maintenance, I'm sure such an explosion on that large of a scale would never have happened.

wheelie34

9:15 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



At 10:12 Monday UK time, my server on the 69.57.xx.xx rack has come back online (and my phone has stopped ringing)

rise2it

9:49 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



sites hosted by another host (someone said it's part of this mess) are still down...5:47am EST Monday, June 2nd.

No biggie for me, luckily...nothing but a few hobby sites there.

Hope they get things back up quickly for you guys that are losing money over this...at least they appear to be trying.

[edited by: engine at 12:03 pm (utc) on June 3, 2008]

martinibuster

10:02 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It doesn't cost much to have a lesser powered backup server on standby with a carbon copy of your site ready to go live at a moments notice.

Good advice.

wheelie34

10:11 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When it hit the fan on saturday night, I started to move MAJOR sites to another server DNS these days can propogate within minutes or a few hours, yesterday SUNDAY I had changed around 60 sites DNS, they were all live and getting emails after around 3 hours.

Reset DNS then add the site to the new server and upload files db's etc while dns is busy changing, it's quite quick these days DEPENDING on ISP I have one customer who STILL is down and a ping retunrs the OLD server (supanet UK) anyone on BT should see the changes within an hour usually

YMMV

edit: my reply was to martinibuster who asked about DNS propogation times, looks like he edited his post

thegreatpretender

10:37 am on Jun 2, 2008 (gmt 0)

10+ Year Member



It doesn't cost much to have a lesser powered backup server on standby with a carbon copy of your site ready to go live at a moments notice.

So how do you deal with the DNS propagation? I have a complete backup of my db and files and have another server out of theplanet and I was thinking of switching the DNS when I first learned what has happened, but I was also worried that by the time it propagates the planet might have fixed the problem already and have to switch back again. I thought it will just prolong the downtime of the site. So just decided to wait and do some other things.

swa66

11:11 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DNS: every single record in DNS has a TTL value (time to live).
The shorter you set it, the faster the updates will propagate.
The longer you set it, the longer it will be cached by nameservers not coming back to refresh the answer, and hence the lighter the load is.

It's up to you to decide ... and balance your needs between the ability to move fast or the lighter load on the DNS servers.

Having more DNS servers can relieve your server load if it ever becomes a problem, but do note this performance is also visible to end users as a cached answer gets to them significantly faster than a query that needs to be forwarded.

Moving your authoritative nameservers about takes as long as the TLD owner has set his TTL values on your NS and (optionally) glue records. That's out of your control, so you need to have them spread out enough not to run into trouble with loosing all of them.

Should you loose the master server: make sure to promote one of the remaining slaves to master and reconfigure the other slaves. I you don't, the slaves will eventually stop handing out information as it grows stale.
All those timings are configured in the SOA DNS record. But individual records can have their own TTL.

cabowabo

11:28 am on Jun 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was with EV1 and had a lot of dedicated servers with them and last year I upgraded to new servers, all hosted at The Planet vs. the EV1 space. I must say I was VERY disappointed in the level of support and knowledge by the techs @ The Planet vs. EV1.

I feel very fortunate not to be affected, but I am still concerned why a Halon system wasn't present, or if it failed.

razoras

3:39 pm on Jun 2, 2008 (gmt 0)

10+ Year Member



This is the same The Planet that took hours to respond to a trouble ticket about an apparent hard drive failure, only to discover that our server had been producing an audible alarm for who knows how long. I'm not particularly surprised something caught fire somewhere.

juliocrd

4:33 pm on Jun 2, 2008 (gmt 0)

10+ Year Member



Hi , this is Julio from Lima, Peru. We are also struggling with the planet fire problem; our website has been down more than 48 hours now, in addition to some other personal ones as well. I read all the posts and they were very helpful for creating a backup plan. Just a question: which other server provider, like the planet, has an equally good service? Greetings and I hope everything gets fixed soooon.

Bye, Julio.

himalayaswater

5:23 pm on Jun 2, 2008 (gmt 0)

10+ Year Member



There are many good server provider just like TP. Forum policy does not allow us to drop urls, just google it or pm me and I will happy to through list of few good provider.

bsterz

5:42 pm on Jun 2, 2008 (gmt 0)

10+ Year Member



All of my servers are back up but one. I'm actually pretty happy with how they responded. I have needed some help here and there and they have been very helpful today. For instance they rooted right into my server from a chat to confirm that I had made a config tweak correctly. It's terrible that the situation happened, but I really feel like they are trying very hard to be helpful and to get it fixed - whatever that entails.

I have to say I feel better about TP today than I did a week ago - crazy huh?

Bill

phranque

8:35 am on Jun 3, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], julio!

the Webmaster General forum Charter [webmasterworld.com] specifically prohibits discussion about hosting providers.

night707

10:44 am on Jun 3, 2008 (gmt 0)

10+ Year Member Top Contributors Of The Month



There are many easy ways to ensure that a site will never be down.

rise2it

11:40 am on Jun 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



lowesthosting.com itself is still down

sites hosted there are still down

I'm really surprised the mainstream media (USAToday tech section, etc.) isn't picking up on this.

innocbystr

1:22 pm on Jun 3, 2008 (gmt 0)

10+ Year Member



I'm really surprised the mainstream media (USAToday tech section, etc.) isn't picking up on this.

I'm surprised at this also. I get daily newsletters from a couple tech publications and no mention of it. The only place I've seen it mentioned at all is in forums. You'd think with @ a million sites down at one time it would have received some attention.

This 88 message thread spans 3 pages: 88