Googlebot-Image triggering 403 on non image files

I'm using this to block fake googlebots and it works well except if Googlebot-Image/1.0 tries to access a non image file (robots.txt, page.html). Image bot can access images fine. The Ip address in the logfile is Googles, so it shouldn't be getting it at all.

#
# Validate Googlebot user-agent and IP, respond with 403-Forbidden if invalid
RewriteCond %{HTTP_USER_AGENT} Googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Googelbot [NC]
RewriteCond %{HTTP_USER_AGENT}!^Mozilla/5\.[0-9]+\ $compatible;\ Googlebot/2\.[0-9];\ \+http://www\.google\.com/bot\.html$$ [OR]
RewriteCond %{REMOTE_ADDR}!^66\.249\.
RewriteRule .* - [F]

Since I can't test from Googles IP, I tried giving a blank referrer and image bots UA and wandered my site, I also could get to images fine, but any non-image files returned a 403. I removed that from htaccess and had total access again.

So I'm a bit lost, why is image bot getting this error on non image files, with a valid Google IP? And why could I access the images without a valid Google IP, but not html pages? Same pattern but it makes no sense to me.

Am I looking at the wrong chunk in htaccess?

Mediapartners-Google/2.1 and regular (real) Google bots are not hitting this problem so far that I can find, just the image bot.

RewriteCond %{HTTP_USER_AGENT} Googlebot¦Googelbot [NC] RewriteCond %{HTTP_USER_AGENT} !^Mozilla/5\.[0-9]+\ $compatible;\ Googlebo[b]t(-Image)?/[1-9]\.[0-9];\ \[/b]+http://www\.google\.com/bot\.html$$ [OR] RewriteCond %{REMOTE_ADDR} !^66\.249\. RewriteRule .* - [F]

# Switch images on these sites to something I prefer to send out RewriteEngine On RewriteCond %{HTTP_REFERER} ^http://(.+\.)?annoyance\.com/ [NC] RewriteRule .*\.(jpe?g¦gif¦bmp¦png)$ /img/bandwidth.jpe [L] # # Forbid images to this RewriteCond %{HTTP_REFERER} ^http://(.+\.)?bigthief\. [NC] RewriteRule .*\.(jpe?g¦gif¦bmp¦png)$ - [F]

AddHandler application/x-httpd-php .shtml ErrorDocument 404 /404.shtml ErrorDocument 403 /403.shtml Options All -Indexes # # 410 Permanent removed Redirect gone /really/gone.cgi # Options +FollowSymLinks RewriteEngine On RewriteBase / # # Block libwww-perl except from AltaVista, Inktomi, and IA Archiver RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/[0-9] [NC] RewriteCond %{REMOTE_ADDR}!^209\.73\.(1[6-8][0-9]¦19[01])\. RewriteCond %{REMOTE_ADDR}!^209\.131\.(3[2-9]¦[45][0-9]¦6[0-3])\. RewriteCond %{REMOTE_ADDR}!^209\.237\.23[2-5]\. RewriteCond %{REMOTE_ADDR}!^208\.70\. RewriteCond %{REMOTE_ADDR}!^207\.241\.224\.0\/20 RewriteRule .* - [F] # # Block Java and Python URLlib except from Google and Yahoo Python RewriteCond %{HTTP_USER_AGENT} ^(Python[-.]?urllib¦Java/?[1-9]\.[0-9]) [NC] RewriteCond %{REMOTE_ADDR}!^207\.126\.2(2[4-9]¦3[0-9])\. RewriteCond %{REMOTE_ADDR}!^64\.233\.172\. RewriteCond %{REMOTE_ADDR}!^216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\. RewriteRule .* - [F] # # Block most random-letter. non-Mozilla user-agents RewriteCond %{HTTP_USER_AGENT}!^Mozilla # 15 or more chars with no "/.{};" characters RewriteCond %{HTTP_USER_AGENT} ^[a-z0-9\ ]{15,}$ [NC] # no vowels after 5 characters RewriteCond %{HTTP_USER_AGENT} [b-df-hj-np-tvwxz]{5,} [NC] RewriteRule .* - [F] # # Block Fake Googlebots - Validate Googlebot user-agent and IP RewriteCond %{HTTP_USER_AGENT} Googlebot¦Googelbot [NC] RewriteCond %{HTTP_USER_AGENT}!^Mozilla/5\.[0-9]+\ $compatible;\ Googlebot(-Image)?/[1-9]\.[0-9];\ \+http://www\.google\.com/bot\.html$$ [OR] RewriteCond %{REMOTE_ADDR}!^66\.249\. RewriteRule .* - [F] # # Block blank referer -AND- user-agent (except for head, favicon and feed requests) RewriteCond %{REQUEST_METHOD}!^HEAD$ RewriteCond %{HTTP_REFERER}<>%{HTTP_USER_AGENT} ^<>$ RewriteRule!\.(ico¦rss)$ - [F] # # Block a few more bad guys SetEnvIfNoCase User-Agent "(Some¦Bad¦Guys)" banned Order Allow,Deny Deny from ###.###.###.## Allow from all Deny from env=banned # # # Block images from these sites RewriteCond %{HTTP_REFERER} ^http://(.+\.)?leach\.com/ [NC,OR] RewriteCond %{HTTP_REFERER} ^http://(.+\.)?hugeleach\. [NC] RewriteRule .*\.(jpe?g¦gif¦bmp¦png)$ - [F] # # Switch images on these sites RewriteEngine On RewriteCond %{HTTP_REFERER} ^http://(.+\.)?annoyance\.com/ [NC] RewriteRule .*\.(jpe?g¦gif¦bmp¦png)$ /clipart/bandwidth.jpe [L] # # DONE BLOCKS - ONTO REDIRECTS # # redirect old pages to new urls rewriterule ^old\.htm$ http://www.example.com/new/ [R=301,L] # # Remove useless?junk, keep needed RewriteCond %{THE_REQUEST} [?] RewriteCond %{REQUEST_URI}!^/need/string/here\.php$ RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L] # # remove multiple slashes anywhere in url RewriteCond %{REQUEST_URI} ^(.*)//(.*)$ RewriteRule . http://www.example.com%1/%2 [R=301,L] # # Remove extra URL-path info if filetype present in URL RewriteRule ^([^.]+\.[^/]+)/ http://www.example.com/$1 [R=301,L] # # index.shtml and index.php to / RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(shtml¦php)\ HTTP/ RewriteRule ^(([^/]*/)*)index\.(shtml¦php)$ http://www\.example\.com/$1 [R=301,L] # # non www to www # RewriteCond %{HTTP_HOST} . RewriteCond %{HTTP_HOST}!^www\.example\.com$ RewriteRule (.*) http://www.example.com/$1 [R=301,L]

# Block Fake Googlebots RewriteCond %{HTTP_USER_AGENT} Googelbot [NC] RewriteRule .* - [F] # # Validate Googlebot IP RewriteCond %{HTTP_USER_AGENT} Googlebot [NC] RewriteCond %{REMOTE_ADDR} !^66\.249\. RewriteRule .* - [F] # # Validate Googlebot user-agent RewriteCond %{HTTP_USER_AGENT} Googlebot [NC] # Search bot RewriteCond %{HTTP_USER_AGENT} !^Mozilla/5\.[0-9]+\ $compatible;\ Googlebot/[1-9]\.[0-9];\ \+http://www\.google\.com/bot\.html$$ # Adwords bot RewriteCond %{HTTP_USER_AGENT} !^Mozilla/5\.[0-9]+\ $compatible;\ Googl[b]eBo[/b]t/[1-9]\.[0-9];\ \+http://www\.google\.com/bot\.html$$ # Image bot RewriteCond %{HTTP_USER_AGENT} !^Googlebot-Image/[1-9]\.[0-9]\)$ # Adsense bot RewriteCond %{HTTP_USER_AGENT} !^Mediapartners-Google/[1-9]\.[0-9]\ (+http://www\.googlebot\.com/bot\.html\)$ # Mobile bot - Note that pattern is not start-anchored RewriteCond %{HTTP_USER_AGENT} !Googlebot-Mobile/[1-9]\.[0-9];\ \+http://www.google.com/bot.html\)$ RewriteRule .* - [F]

# Valid Googlebot IP address RewriteCond %{REMOTE_ADDR} ^66\.249\. # Search bot RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.[0-9]+\ $compatible;\ Googlebot/[1-9]\.[0-9];\ \+http://www\.google\.com/bot\.html$$ [OR] # Adwords bot RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.[0-9]+\ $compatible;\ Googl[b]eBo[/b]t/[1-9]\.[0-9];\ \+http://www\.google\.com/bot\.html$$ [OR] # Image bot RewriteCond %{HTTP_USER_AGENT} ^Googlebot-Image/[1-9]\.[0-9]\)$ [OR] # Adsense bot RewriteCond %{HTTP_USER_AGENT} ^Mediapartners-Google/[1-9]\.[0-9]\ $\+http://www\.googlebot\.com/bot\.html$$ [OR] # Mobile bot - Note that pattern is not start-anchored RewriteCond %{HTTP_USER_AGENT} Googlebot-Mobile/[1-9]\.[0-9];\ \+http://www.google.com/bot.html\)$ # Skip next rule if valid Googlebot request RewriteRule .* [S=1] # # This rule is skipped by the previous rule if it detects a valid Googlebot request RewriteCond %{HTTP_USER_AGENT} Googlebot¦Googelbot [NC] RewriteRule .* - [F]

Googlebot-Image triggering 403 on non image files

LunaC

jdMorgan

LunaC

jdMorgan

LunaC

LunaC

jdMorgan

LunaC

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week