Forum Moderators: phranque
Well, I have messed up unimportant sites where it did not matter that I did the testing directly in the live environment. I have, however, fixed live sites (... that did matter) which have been messed up by someone else (who was subsequently fired).
Scary? Can be. The amount of "damage" heavily depends on what's been messed up.
Syntax error in a script? Pfft.
Massive DB update that went wrong? Eep!
Loss of order or customer information? Gaaah!
If you have great backups (which are up-to-date and which you know can be restored in a jiffy) it's not that big of a deal. If you don't ... it may very well be fatal.
A system is only as good as the amount of time and effort it takes to restore it when things go wrong.
The fix I talked about earlier ... that someone else messed up. Well, this happened to be a an untested DB update. The guy should've tested and re-tested the update in a test environment before running it. To make matters worse, this happened on a quite busy ecommerce platform. Hundreds of thousands of orders were messed up. About two thirds of these could immediately (within 5 minutes) be restored from last night's backup. The others should have been just as easy to restore if the guy had only backed up the DB right before running the stupid untested update. So ... "feel free to take the rest of the day off, while I am still digesting what just happened" ... and then off I went to save 35000 orders. I wouldn't trust anyone else with the fix. It had to be done right. And it had to be done manually. And it had to be done quickly. It took me about 5 hours. 5 long, stressful, horrific hours of frenetic manual DB fixing. In the end, all orders but 2 were restored by running handfuls of manual queries and digesting thousands of lines of server logs ... Horrible!
So ...
1) Don't mess it up in the first place
2) Know that if you ever do mess things up (which you will) that you have sufficient, reliable, tested backups which will let you sleep comfortably
3) Have disaster recovery plans in place
Do that, and things will be just peachy!
Ever messed up a live site?
I don't just mess em up. I knock em offline.
Friday evening and I'm walking out the door a minute later. Computer free all weekend. No way to check or verify that the change was correct. Why check it though? I mean how hard is it change nameservers? That's how complacent I was back then and it's come back to bite me more than once.
Never quite with that many teeth though.
The worst blunder in my career so far resulted in a loss of approx $150K. Though i like to boast, sorry I can't take total credit for that one, it was a group effort...
I know one anecdote where a fellow programmer burned down a whole fleet of sites by deploying a script with one typo - a small 'k' instead of a capital 'K'. Fun!
So, while I was the one at the keyboard, I wasn't the one who said to make the change, nor had I worked there long enough to understand the system well enough to know what all to look for in verifying the change was correct.
Though, that didn't stop my boss from blaming me...
...gee, thanks, I've been here a week, how was I to know that there were other sites of ours consuming this web service, and your "fix" was going to break them.