I'm at a point where I need to set up some kind of performance and uptime monitoring.
I'm looking after:
- multiple linux servers
- multiple web services and sites on each
- soon to have master/slave mysql db's
- redundancy and load balancing
- a scary amount of traffic
Here are some must-have and nice-to features:
must-have:
- immediate alerts when things go kaboom
- check if essential services are running, like crond and httpd and mysqld
- cpu, memory, disk space, bandwidth
- able to conduct functional tests using custom php scripts
nice-to-have:
- graphs
- history log / archive
- warnings when things are getting close to capacity
For example, I would want to know immediately:
1) if crond stops for any reason
2) if a PHP error happens on Line 322 of call-function.php
3) if my disk(s) get full
4) if there's a tornado that destroys the server room and everything in it
what do y'all use for this?