Forum Moderators: open

Message Too Old, No Replies

Blocking UA "-"

         

skirril

4:04 pm on Apr 12, 2002 (gmt 0)

10+ Year Member



Hello,

I am currenlty working on my .htaccess file in order to block all those spiders that don't request robots.txt...

Now I wondered: a normal spider, or visitor will send their user agent with every request.

So is it safe to assume that those ppl/spiders that send "-" in the user agent field have something to hide, and can therefore be blocked?

What about caching/proxies. What ua will the proxy send?

Skirril

HandwovenRug

4:30 pm on Apr 12, 2002 (gmt 0)

10+ Year Member



Blocked UAs are in most cases no robots or the like. There are a lot of possibilities to block the own referer field or send the explicit "-" string. A proxy sends only seldom an own UA string, such infos are found in the "via"-HTTP-header. It seems not to be such a good idea to ban the "-" UAs

wilderness

5:54 pm on Apr 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>So is it safe to assume that those ppl/spiders that send "-" in the user agent field have something to hide, and can therefore be blocked?>

I don't see it as logical that a "reputable" organization would feel it necessary to hide either UA, referrer or their intent of use with what they are gathering.

Too bad on numerous attempts at emailing with Lycos (who happens to leave both UA and referrer blank when reading robots yet not on their page spidering) they can't understand the obvious.

bird

12:00 am on Apr 13, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I also see no reason why legitimate visitors would completely hide their UA.

But I have the following at the top of my .htaccess, before the long list of stuff blocked for various reasons and by various methods:

# make this accessible to everyone (except for hard blocks)
RewriteRule ^robots.txt$ - [L]

# allow LookSmart (64.241.24[23].#) even without an UA
RewriteCond %{REMOTE_ADDR} ^64\.241\.24
RewriteRule .* - [L]

This makes sure that (almost) everybody can read robots.txt, and allows one notoiously broken robot explicitly (I actually had that one blocked for a while with no negative consequences, but I'm just a nice person... ;)).

Key_Master

12:24 am on Apr 13, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are other legitimate spiders that don't have user agents. For example, there is a Lycos spider that doesn't use a user agent when it requests robots.txt. Also Gigablast continues to spider without an agent.