homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Disallowed Googlebot-Image bot still spidering

 10:57 am on Jul 6, 2011 (gmt 0)

I don't wish any of my site images listed on search engines. Therefore I have a blanket block on my /images/ folder in my robots.txt as below, plus the additional Googebot-Image/1.0 block in place. However the Googlebot-Image bot still insists on spidering a few images of my site, despite receiving a 403 error.

User-agent: *
Disallow: /i

User-agent: Googlebot-Image/1.0
Disallow: /

403 GET Googlebot-Image/1.0 /images/abc.gif
403 GET Googlebot-Image/1.0 /robots.txt

I'm guessing the Googlebot-Image bot is receiving the 403 on the robots.txt so is therefor unable to ascertain what it can and cannot spider, so continues to do so.

I cannot figure out why it is receiving a 403 though.

Can anyone shed any light on what may be happening please?
Many thanks in advance



 11:09 am on Jul 6, 2011 (gmt 0)

Could this line in my .htaccess be blocking the access of the my tobots.txt (and other files) to the A.U. 'Googlebot-Image/1.0' ?

SetEnvIfNoCase User-Agent "^(.*)Image" bad_bot



 5:44 pm on Jul 6, 2011 (gmt 0)

Ok Google has now revisited my robots file and it 'seems' as though a previous edit of my robots may have accidentally permitted Goolebot-Images to crawl the unwanted directories. The onslaught has now ceased at last.


 5:58 pm on Jul 6, 2011 (gmt 0)

I should think so, unless SetEnvIf has rules all its own. (I use the special form BrowserMatch but it's the same thing.) You can put in an override saying

<Files robots.txt>
Order Allow,Deny
Allow from all

so they have no excuse.

Interesting to know that the Imagebot pays its own separate visits to robots.txt. I thought the robots.txtbot did the work for everyone.

! Important

There is a special rule for the regular googlebot which may also apply to the imagebot. The moment the googlebot is mentioned by name in your robots.txt, it looks only at those lines that use its name. That means that if you have any google-specific rules but you also want it to follow the general rules, you need to say everything twice.


 6:23 pm on Jul 6, 2011 (gmt 0)

Thank you Lucy.
Regarding your last paragraph, would be good enough to give me an example please (I'm a few cards short of a deck!).


 12:38 am on Jul 9, 2011 (gmt 0)

example please

User-agent: *
Disallow: /gtfo

User-agent: Googlebot-Image/1.0
Disallow: /images
Disallow: /gtfo


 8:47 am on Jul 9, 2011 (gmt 0)

OK I see what you mean now, many thanks (both).

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved