Forum Moderators: bakedjake
The server has been rebooted, apache, mysql and php have all been reinstalled clean, yet the (new) apache log fills up with segfaults from children, and the mysql tables show constant index errors.
Could it be a hardware issue (and how to confirm it?), or is there an obvious other software component to check-out? The server is only 18 months old and in a colo. After a bumpy 9 months start (seemed to be config issue with mysql) it has run faultlessly, now this.
I seem to be watching my site going down the toilet.
Can anyone draw a direct connection between the /var partition fill and the current experience of segfaults?
In any case, every table was checked and/or repaired *before* bringing the site back up, for the very reason that you mentioned.
I am now far more familiar with mysql repair options than any sane person has any right to be.
But it might be easier to do a clean OS install. Sounds like you have recent practice reloading your software and data. ;)
As for the OS reinstall, I agree, but have to wait for my colo hosts as I do not have a static IP, nor enough experience to launch into it myself.
I still have a lingering feeling that there is some obvious component to checkout.
PS checking the apache error-log shows 8 hours up before the first child segfault, almost exact each of 3 times.
Do you keep a swap file in /var?
The only time I've seen ps crash has been if there's a kernel issue, or the CPU is overheating
Hardware or software?, that is the question.
Unfortunately, now that I need a fast response from my colo host I`m not getting it, so thanks for the responses here.
One thing you might try is doing an rpm verify to check for random corruption. It's unlikely to find anything in /var, but if some system library is corrupted it should turn up.
Personally I'd try the strace. If there's some consistent file or library being accessed before the segv it will show up.
One thing you might try is doing an rpm verify
rpm -Vaearlier this morning with no obvious suspects. He also mailed me that he will get to the colo on Thurs and upgrade the OS (hurrah!).
After much research, my prime suspect is the rh8 apache-prefork-mpm/php combo, which php themselves state is not to be used in a production environment [uk.php.net]. Since restarting apache seemed to give 8 hours grace (shades of Windows, huh?) I did an
apachectl gracefulbut this did not help much:
[Wed Dec 08 02:14:56 2004] [notice] Graceful restart requested, doing restart
...
[Wed Dec 08 02:14:58 2004] [notice] Apache/2.0.40 (Red Hat Linux) configured -- resuming normal operations
[Wed Dec 08 02:55:04 2004] [notice] child pid 1995 exit signal Segmentation fault (11)
Once again, the responses from yourselves have helped enormously, so thank you. With little sleep, no food, no isp-response and a Google PR [checkpagerank.com] for the site of 0, I was getting mighty desperate (desparate?).
Lance (the main man from the ISP) is at the colo and using memchecker from a CD to test the memory (2 x 512k sticks of Crucial-supplied ECC memory). Sure enough, it is failing - at ~912 MB! One stick bad, one OK. The memory is under guarantee, so that`s OK, but, 18 months of hell to find out...
One question to ask: the mem-checker reported that the ECC was switched off. This could be just because of the testing routine, but does anyone know of a linux util to report ECC status?