Forum Moderators: phranque
Blocking these type of agents really depends on if they respect the Robots.txt file...some agents are not that respectfull...
try to following:
User-agent: AskBar
Disallow: /
NOTE: I am assuming that the name of the agent above is
AskBar
if not then you need to substitute the name of the agent..
if the crawler is set up to, first, read the directives in the robots.txt file and then proceed the above will notify the bot to not proceed..
Regarding the other 2 you mention...see if you can find the name of the agent in the log string...
For Example, if you are seeing Googlebot in your log file then you would see this string:
64.68.82.172 - - [01/Mar/2004:02:00:29 -0500] "GET /robots.txt HTTP/1.0" 200 473 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
The obvious bot name here is "Googlebot" and you may want to determine the location of the name of the other bots by analyzing your properly formatted google log data...
You may have to test some the find the right name to block the critters...
Good luck..
mod_rewrite to ban user agents with a particular word in them via .htaccess, eg. from your examples: Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} AskBar [OR]
RewriteCond %{HTTP_USER_AGENT} Hotbar [OR]
RewriteCond %{HTTP_USER_AGENT} FunWebProducts
RewriteRule ^.* - [F]
The above issued an error 403 Forbidden error for all visitors with either "AskBar", "Hotbar" or "FunWebProducts" in the user-agent string. Obviously, you need to make sure they are bots rather than human visitors. Use at your own risk!