Forum Moderators: open

Message Too Old, No Replies

DMOZ outage and it's lesson.

DMOZ, hardware, software, backup

         

mojomike

4:51 pm on Nov 18, 2006 (gmt 0)

10+ Year Member



If there is one thing in my life, I have back-up plans like no other.

What brought me to post on this forum was the outage of DMOZ that is still happening. What can we learn from this.
A) backing up is still the most valuable resource you can do.
B) nothing can beat spreading the load over multiple servers ( take note that the forums did not go down )

Now, I have not read what really brought DMOZ down but it's been said they had hardware failure, then it's been data issues.

well, hardware failure means loosing the entire server ( for sake of argument ). If you have learned anything at webmaster world, when we design web sites, databases and other goodies, we future proof them, this way we can move them in a blink of an eye to another server.

Given there are some people on this forum that run a 1 million daily page-load site from 1 server, they are most likely thinking about this issue right now and asking themselves, what would happen if my server crashed.

Now, as in the past, have always had back-up plans, and this is what I implemented for myself.

I went out and hired a programmer to make a simple database backup program that e-mails me the file every other day ( to my g-mail account )

my web site is simple, but I backed it up and it's in 3 media's CD, floppy disks, and DVD. each stack also has a simple ftp program and password/login instruction sheet.

I found a nice cheap web host, paid 1 year in advance, have everything working their, I update that once a month.

Now I know everything would be slow on the new host but at least I would fail-over to something and my clients would not see any problems.

All I am asking is that we all start thinking about backup plans, sometimes the cheapest implementation can save you a lot of grief.

Mike

lammert

3:03 pm on Nov 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The DMOZ crash was a little bit more complicated than just a crash of a stable system. The crash happend, when developers were in the process of changing the software on the editors.dmoz.org server (ironically to make it less a single point of failure). These are always critical situations. A data backup is only usable when the software that imports the data assumes exactly the same data structures and relations as present in the saved backup.

During an update process in a production environment--something that apperently happend on editos.dmoz.org--backup structures and program structures may go out of sync, which makes it difficult when a crash happens in the middle of this process to restore a working software system together with a data set which matches that software.

So not only should this crash be a lesson for us to have solid backup procedures (and periodic restore tests) it is also a warning that we should separate development and production environments whenever possible.