Forum Moderators: phranque

Message Too Old, No Replies

.htaccess, ban all spiders/bots but not actualy users

only want users not bots..

         

ezyid

1:10 am on Oct 26, 2005 (gmt 0)

10+ Year Member



Any ideas.. im using a robots.txt but it dosent stop them,

I am running a proxy service..
If someone links to a file on my proxy well lets say the bot may follow every link then spider a 2nd version of the internet through my proxy.. i dont think i have that kind of bandwith!
Nor would like to have that much duplicate content penalisation!

jdMorgan

1:30 am on Oct 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ezyid,

For those spiders that identify themselves properly using the HTTP User-agent request header, this doesn't sound like a very difficult project. You could use the RewriteCond directive of mod_rewrite, testing the server variable %{HTTP_USER_AGENT}, and forbid access to all files requested by the robots you want to exclude.

For stealth 'bots that claim to be browsers, you'll need to collect a list of their IP address ranges, and exclude them by IP address, also using RewriteCond, but with server variable %{REMOTE_ADDR}.

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

You could also use a combination of mod_setenvif and mod_access if mod_rewrite is not available to you, but it's a bit less straightforward.

Jim