I cache the text downloaded from the site, for various purposes, but I can compare raw HTML from old to new. However, if your last text download was of a bad page, you won't be alarmed the new one is bad either. As long as your baseline sample is good this method rocks.
However, nothing beats a raw look at what the heck is in your index every now and then.
Sometimes you find you've been gamed when you see the same web page 10x with different domains and different emails trying to cover their tracks.
Google Pagerank is an interesting idea but it won't work for most in my niche. Many of them are brand new sites trying to get noticed, usually top quality with no PR. Most are the best flash sites you've ever seen which also don't rank well. Unfortunately, often the title will say "Index" or "Home", very sad but an opportunity for me! If they had PR they wouldn't need my help in the first place.