|I need the "top" command on steroids|
To anyone who's familiar with the "top" command in *nix. While I'm working, one of my computers has 6 SSH terminals into 6 web servers, each one running "top" repeatedly using a 10-second delay. This way, I can constantly have a look at server load as well as the top processes currently running.
But this is not enough.
At the moment, I have 12 servers that matter. Also, it's a peripheral computer where I have these running -- it's not as though I'm continuously staring at the server loads. So while I can glance over to see if any of these servers is particularly busy, there's nothing to grab my attention if one of them spikes for awhile.
Is there a piece of software which would allow me to SSH into a dozen plus web servers, run the "top" command, but give a better GUI? Specifically, I'd really like to see the processor load color-coded -- if it has been quite high for awhile (say, 90%+ for 2 minutes), then turn yellow. If it has been higher than that for longer (say, 95%+ for 10 minutes), then turn red.
That would fit my needs much better than what I'm doing now.
Has anyone found such a solution? Are there other / better solutions I haven't thought of?
I must admit, this is one hard thing to do a Google search for. Try it: search for unix top with gui and nothing related to what I'm thinking of comes up. Perhaps I'm not searching properly ... any suggestions?
In a way, this almost sounds like "NOC" software.
Not exactly what you are asking, but which may fit your purpose is the free Munin package. This package is installed on a webserver and queries its slave computers every 5 minutes or every minute about the status. It not only records process load, but also disk space, temperatures, MySQL queries etc. You can write your own client modules in a scripting language, combine multiple machines in one graph etc.
The data is stored in round-robin databases which are used to generate graphs of the last day, the last week, the last month, and the last year. It makes it easy to spot trends and predict when in the future your machine will be overloaded or your disk will be full.
Warning and error levels can be set which change names of the items you follow to a different color.
The setup via a webserver makes it possible to collect data, even when your desktop computer is switched off.
Webmin gives a status overview on its management page, other "panel" software might have similar features. You could use something like greasemonkey to refresh /scrape that data.
Time on system
Kernel and CPU
CPU load averages
Local disk space
I dug a bit deeper, and noticed that the data comes from this perl script - which you might be able to adapt for your own needs: /usr/local/webmin-1.510/proc/index_cpu.cgi (perl5.10.0)
Edit: try Lammert's idea - I completely forgot about munin.