Forum Moderators: phranque
AuthUserFile /dev/null
AuthGroupFile /dev/null
RewriteEngine On
RewriteCond %{HTTP_REFERER}!^http://example.com[NC]
RewriteCond %{HTTP_REFERER}!^http://www.example.com [NC]
RewriteRule /* http://example.com [R,L]
My understanding (PLEASE correct me if I'm wrong) is that this blocks access to SE's trying to index my images. Can someone please tweak the above code to block access to all EXCEPT SE's?
Below is a list of main ones. I know Google has a special image bot (indicated below); if any others do too, please edit the identifier below. Feel free to add other major bots too.
If there's a better solution to this whole issue (a useful link will do), I'm interested in that as well.
RewriteCond %{HTTP_USER_AGENT} ^Googlebot-Image.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^yahoo.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^msnbot.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^inktomi.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^zyborg.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^webcrawler.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^gigabot.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^scrubby.* [NC,OR]
# these last ones are image indexers I got from a bot DB list
RewriteCond %{HTTP_USER_AGENT} .*ImageScape.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla\s3\.01.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CydralSpider.* [NC,OR]
[edited by: jdMorgan at 7:27 pm (utc) on Dec. 20, 2006]
[edit reason] example.com [/edit]
Also, any comments on the bot list are welcome.
AuthUserFile /dev/null
AuthGroupFile /dev/null
RewriteEngine On
RewriteCond %{HTTP_REFERER}!^http://mysite.com[NC]
RewriteCond %{HTTP_REFERER}!^http://www.mysite.com [NC]
RewriteCond %{HTTP_USER_AGENT}!^Googlebot-Image.* [NC]
RewriteCond %{HTTP_USER_AGENT}!^yahoo.* [NC]
RewriteCond %{HTTP_USER_AGENT}!^msnbot.* [NC]
RewriteCond %{HTTP_USER_AGENT}!^inktomi.* [NC]
RewriteCond %{HTTP_USER_AGENT}!^zyborg.* [NC]
RewriteCond %{HTTP_USER_AGENT}!^webcrawler.* [NC]
RewriteCond %{HTTP_USER_AGENT}!^gigabot.* [NC]
RewriteCond %{HTTP_USER_AGENT}!^scrubby.* [NC]
RewriteCond %{HTTP_USER_AGENT}!.*ImageScape.* [NC]
RewriteCond %{HTTP_USER_AGENT}!^Mozilla\s3\.01.* [NC]
RewriteCond %{HTTP_USER_AGENT}!^CydralSpider.* [NC]
RewriteRule /* http://example.com [R,L]
[edited by: jdMorgan at 7:26 pm (utc) on Dec. 20, 2006]
[edit reason] example.com [/edit]
I only want to block one bot that is ignoring the robots.txt file... its name is -
web2.gold.funnelback.com and IP is 64.72.112.53
How do I edit/Where do I put this code? Does this go right into Apache 2 httpd file?
Here is my attempt at writing the necessary code based on your examples -
RewriteCond %{ HTTP_USER_AGENT} ^web2.gold.funnelback.com*
But where does that go? I don't see an htaccess file anywhere on our server.
Thanks for any help you can give!
Megan
Other helpful links:
tutorial: www.workingwith.me.uk/articles/scripting/mod_rewrite
cheat sheet for symbol meanings: www.ilovejackdaniels.com/mod_rewrite_cheat_sheet.pdf
various applications: thejackol.com/htaccess-cheatsheet
[edited by: jdMorgan at 7:14 pm (utc) on Dec. 20, 2006]
[edit reason] De-linked [/edit]
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com [NC]
RewriteCond %{HTTP_USER_AGENT} !Googlebot-Image [NC]
RewriteCond %{HTTP_USER_AGENT} !yahoo [NC]
RewriteCond %{HTTP_USER_AGENT} !^msnbot [NC]
RewriteCond %{HTTP_USER_AGENT} !zyborg [NC]
RewriteCond %{HTTP_USER_AGENT} !webcrawler [NC]
RewriteCond %{HTTP_USER_AGENT} !^Gigabot [NC]
RewriteCond %{HTTP_USER_AGENT} !scrubby [NC]
RewriteCond %{HTTP_USER_AGENT} !ImageScape [NC]
RewriteCond %{HTTP_USER_AGENT} !^Mozilla\ 3\.01 [NC]
RewriteCond %{HTTP_USER_AGENT} !CydralSpider [NC]
RewriteRule .* - [F]
There is little use trying to redirect a request for an included image to an HTTP page. It won't work, because browsers just don't know how to do that -- You can't load a page into the spot where an <img src="..."> tag has been used. So, just return a 403-Forbidden as shown.
Jim
[edited by: jdMorgan at 7:28 pm (utc) on Dec. 20, 2006]