Forum Moderators: open
So thank goodness for the really impressive (and impressively scary) compilation at Psychedelix.com: "Database of robots, spiders & other user-agents [psychedelix.com]" pages. I don't know how they do that, but I'm glad they do because they do a terrific job.
For instance, so far today I've blocked the following:
* 65 identified crawlers blocked requesting 357 pages.
* 116 stealth crawlers blocked requesting 980 pages.
The only reason the identified crawlers ask for fewer pages is because the new ones get stopped at the index page. Older crawlers that have been to my site before and know the page names prior to being blocked still ask for what they know about before going away.
Stealth crawlers unfortunately get further before you can determine bot or human and get tend to get more pages. Since they got a peek at the full site navigation they'll keep asking for more pages too although they're being bounced at that time.
It's actually a slow day, yesterday was about 4K blocked pages total but there's a few hours to go and I have faith a couple of greedy pigs will hit my server before the day is over.