Welcome to WebmasterWorld Guest from 54.167.76.176

Forum Moderators: goodroi

Message Too Old, No Replies

Disallowed Googlebot-Image bot still spidering

     

cyberdyne

10:57 am on Jul 6, 2011 (gmt 0)

10+ Year Member



I don't wish any of my site images listed on search engines. Therefore I have a blanket block on my /images/ folder in my robots.txt as below, plus the additional Googebot-Image/1.0 block in place. However the Googlebot-Image bot still insists on spidering a few images of my site, despite receiving a 403 error.

Robots:
User-agent: *
Disallow: /i

User-agent: Googlebot-Image/1.0
Disallow: /


Result:
403 GET 66.249.66.33 Googlebot-Image/1.0 /images/abc.gif
403 GET 66.249.66.33 Googlebot-Image/1.0 /robots.txt


I'm guessing the Googlebot-Image bot is receiving the 403 on the robots.txt so is therefor unable to ascertain what it can and cannot spider, so continues to do so.

I cannot figure out why it is receiving a 403 though.


Can anyone shed any light on what may be happening please?
Many thanks in advance

cyberdyne

11:09 am on Jul 6, 2011 (gmt 0)

10+ Year Member



Could this line in my .htaccess be blocking the access of the my tobots.txt (and other files) to the A.U. 'Googlebot-Image/1.0' ?

SetEnvIfNoCase User-Agent "^(.*)Image" bad_bot


Thanks

cyberdyne

5:44 pm on Jul 6, 2011 (gmt 0)

10+ Year Member



Ok Google has now revisited my robots file and it 'seems' as though a previous edit of my robots may have accidentally permitted Goolebot-Images to crawl the unwanted directories. The onslaught has now ceased at last.

lucy24

5:58 pm on Jul 6, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



I should think so, unless SetEnvIf has rules all its own. (I use the special form BrowserMatch but it's the same thing.) You can put in an override saying

<Files robots.txt>
Order Allow,Deny
Allow from all
</Files>

so they have no excuse.

Interesting to know that the Imagebot pays its own separate visits to robots.txt. I thought the robots.txtbot did the work for everyone.

! Important

There is a special rule for the regular googlebot which may also apply to the imagebot. The moment the googlebot is mentioned by name in your robots.txt, it looks only at those lines that use its name. That means that if you have any google-specific rules but you also want it to follow the general rules, you need to say everything twice.

cyberdyne

6:23 pm on Jul 6, 2011 (gmt 0)

10+ Year Member



Thank you Lucy.
Regarding your last paragraph, would be good enough to give me an example please (I'm a few cards short of a deck!).

phranque

12:38 am on Jul 9, 2011 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



example please


User-agent: *
Disallow: /gtfo

User-agent: Googlebot-Image/1.0
Disallow: /images
Disallow: /gtfo

cyberdyne

8:47 am on Jul 9, 2011 (gmt 0)

10+ Year Member



OK I see what you mean now, many thanks (both).
 

Featured Threads

Hot Threads This Week

Hot Threads This Month