|server stalled. need to find out why|
I am using Redhat Linux Fedora 8.
Over the weekend the server crashed, and when I came in this morning none of the urls were resolving to the server.
Dunno exactly what was wrong, but one of the engineers in the hosting place 'rebooted apache' which seemed to fix the problem.
I want to find out what caused the crash, and also when it happened.
I have been looking through the log files without really knowing what I am looking for...
Where should I look? What strategy should I pursue to find out what killed the server?
looking at the log-files you should at least be able to determine, _when_ apache went down (when did the logs stop showing requests?). what exactly happend won't necessarily be in the apache-log (maybe you get some hints from the error-log), but in other system-logs.
since the engineer said, he "rebooted" (aka restarted) apache, that sounds like apache was still running but not taking any requests. wether you find a reason or not, I'd set up a monitoring script to check for outages (like, make a request every 5 minutes, log it's status and how long the response took) and if it happens again, you'll have to do further investigation, preferably with someone from your host.
ok so I looked through the access logs and accesses stop at around 11.30 on saturday and don't resume till monday (when apache was restarted).
It seems to me that the dns server was the problem, since as you say, apache was running, but could not be accessed either through the url or ip (?)
Anyway, is there any resources you reccomend on this (somewhat arcane) area? My Linux Admin book has one (1) page on 'troubleshooting apache' ...
>>>It seems to me that the dns server was the problem,
Nope, not if restarting apache fixed the problem. Since that fixed it, then apache was the problem.
I would look back at the logs a little bit and see what was running.
It's possibly a screwed up process. Or you could have just been hit with an attack of some sort.
I am a near-complete beginner at this...
I don't even know which logs to look at - can you point me to somwhere that will help me get to grips with server logs in general?
look at the error-log, around 11.30 on saturday. also, it might help to have some system-monitoring running (like nagios) to see if something else behaved inappropriate (high load etc). usually if apache keeps running (aka "restart" != "start") but won't take connections, you might get something from the errorlog or you really have to search.
do you have scripts on the server? did you write/check them or do other people upload scripts (shared hosting?)? it might (!) be something where simply all the processes apache is allowed to use, are in use because someone messed up a php-script and has some "sleep($tilljudgementday)"-code in it ... call that often enough and every process will be busy. I know, it's not too common, but I had it once on a server where other people created the websites and I was just doing some administrative works ... took me quite some time to figure that one out.
apart from general failures, if it's a script and you have a lot of scripts, it might take some action to investigate ... if an apache-process segfaults, it won't tell you what the last request was (usually something triggering some bug) and can thus be hard to track down.
but then again: if it's not happening again, save your time, might just have been some obscure hickup that just happens every ten years on a saturday at 11.30 ;)