Forum Moderators: open
Good thing or bad? All seem to revert to amazon's cloud and not a referral in the bunch. Which is reason enough for me. But I do wonder if I'll be taking out some potential referrals from those who actually use gmail.
As a result, I block all IP address ranges of the Amazon compute cloud service, and just hope that no important search company decides to use them for extra "spidering power" as-is, without requiring that Amazon configure valid rDNS for the term of the lease.
[edit] The lack of a referrer is typical for spiders, since they're working from a database that may contain hundreds of "referrers" for any given URL on your site. So I'm not sure how relevant the lack of a referrer was to your decision to block these requests, since (I assume that) the message in the user-agent string was fairly clear about the client being a crawler. [/edit]
Jim
[edited by: jdMorgan at 6:43 pm (utc) on Nov. 3, 2008]
AISearchBot (Email: aisearchbot@gmail.com; If your web site doesn't want to be crawled, please send us a email.)
It came to a couple of my sites, didn't request robots.txt and triggered at least four filters.
I didn't have to go to the trouble of emailing the owner.
...
3 magic words added to you deny list in your .htaccess that do wonders: bot, crawler, spider after whitelist routine had run.