homepage Welcome to WebmasterWorld Guest from 54.196.62.132
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Disallowed Googlebot-Image bot still spidering
cyberdyne




msg:4335684
 10:57 am on Jul 6, 2011 (gmt 0)

I don't wish any of my site images listed on search engines. Therefore I have a blanket block on my /images/ folder in my robots.txt as below, plus the additional Googebot-Image/1.0 block in place. However the Googlebot-Image bot still insists on spidering a few images of my site, despite receiving a 403 error.

Robots:
User-agent: *
Disallow: /i

User-agent: Googlebot-Image/1.0
Disallow: /


Result:
403 GET 66.249.66.33 Googlebot-Image/1.0 /images/abc.gif
403 GET 66.249.66.33 Googlebot-Image/1.0 /robots.txt


I'm guessing the Googlebot-Image bot is receiving the 403 on the robots.txt so is therefor unable to ascertain what it can and cannot spider, so continues to do so.

I cannot figure out why it is receiving a 403 though.


Can anyone shed any light on what may be happening please?
Many thanks in advance

 

cyberdyne




msg:4335685
 11:09 am on Jul 6, 2011 (gmt 0)

Could this line in my .htaccess be blocking the access of the my tobots.txt (and other files) to the A.U. 'Googlebot-Image/1.0' ?

SetEnvIfNoCase User-Agent "^(.*)Image" bad_bot


Thanks

cyberdyne




msg:4335897
 5:44 pm on Jul 6, 2011 (gmt 0)

Ok Google has now revisited my robots file and it 'seems' as though a previous edit of my robots may have accidentally permitted Goolebot-Images to crawl the unwanted directories. The onslaught has now ceased at last.

lucy24




msg:4335909
 5:58 pm on Jul 6, 2011 (gmt 0)

I should think so, unless SetEnvIf has rules all its own. (I use the special form BrowserMatch but it's the same thing.) You can put in an override saying

<Files robots.txt>
Order Allow,Deny
Allow from all
</Files>

so they have no excuse.

Interesting to know that the Imagebot pays its own separate visits to robots.txt. I thought the robots.txtbot did the work for everyone.

! Important

There is a special rule for the regular googlebot which may also apply to the imagebot. The moment the googlebot is mentioned by name in your robots.txt, it looks only at those lines that use its name. That means that if you have any google-specific rules but you also want it to follow the general rules, you need to say everything twice.

cyberdyne




msg:4335926
 6:23 pm on Jul 6, 2011 (gmt 0)

Thank you Lucy.
Regarding your last paragraph, would be good enough to give me an example please (I'm a few cards short of a deck!).

phranque




msg:4337258
 12:38 am on Jul 9, 2011 (gmt 0)

example please


User-agent: *
Disallow: /gtfo

User-agent: Googlebot-Image/1.0
Disallow: /images
Disallow: /gtfo

cyberdyne




msg:4337328
 8:47 am on Jul 9, 2011 (gmt 0)

OK I see what you mean now, many thanks (both).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved