Forum Moderators: open
I am currenlty working on my .htaccess file in order to block all those spiders that don't request robots.txt...
Now I wondered: a normal spider, or visitor will send their user agent with every request.
So is it safe to assume that those ppl/spiders that send "-" in the user agent field have something to hide, and can therefore be blocked?
What about caching/proxies. What ua will the proxy send?
Skirril
I don't see it as logical that a "reputable" organization would feel it necessary to hide either UA, referrer or their intent of use with what they are gathering.
Too bad on numerous attempts at emailing with Lycos (who happens to leave both UA and referrer blank when reading robots yet not on their page spidering) they can't understand the obvious.
But I have the following at the top of my .htaccess, before the long list of stuff blocked for various reasons and by various methods:
# make this accessible to everyone (except for hard blocks)
RewriteRule ^robots.txt$ - [L]
# allow LookSmart (64.241.24[23].#) even without an UA
RewriteCond %{REMOTE_ADDR} ^64\.241\.24
RewriteRule .* - [L] This makes sure that (almost) everybody can read robots.txt, and allows one notoiously broken robot explicitly (I actually had that one blocked for a while with no negative consequences, but I'm just a nice person... ;)).