Page is a not externally linkable
incrediBILL - 5:15 pm on Apr 1, 2009 (gmt 0)
I cache the text downloaded from the site, for various purposes, but I can compare raw HTML from old to new. However, if your last text download was of a bad page, you won't be alarmed the new one is bad either. As long as your baseline sample is good this method rocks. However, nothing beats a raw look at what the heck is in your index every now and then. Sometimes you find you've been gamed when you see the same web page 10x with different domains and different emails trying to cover their tracks. Google Pagerank is an interesting idea but it won't work for most in my niche. Many of them are brand new sites trying to get noticed, usually top quality with no PR. Most are the best flash sites you've ever seen which also don't rank well. Unfortunately, often the title will say "Index" or "Home", very sad but an opportunity for me! If they had PR they wouldn't need my help in the first place.
Now let's say you save the screen shots as a small GIF. Would it not be possible to automate the comparison, detecting if the content of the page has changed significantly since the last crawl? Like a visual checksum. If the page hasn't changed, then there's no reason to review it. Next! There's a better way, it's called Google pagerank.