cyberdyne

msg:4335685 | 11:09 am on Jul 6, 2011 (gmt 0) |
Could this line in my .htaccess be blocking the access of the my tobots.txt (and other files) to the A.U. 'Googlebot-Image/1.0' ?
SetEnvIfNoCase User-Agent "^(.*)Image" bad_bot
Thanks
|
cyberdyne

msg:4335897 | 5:44 pm on Jul 6, 2011 (gmt 0) |
Ok Google has now revisited my robots file and it 'seems' as though a previous edit of my robots may have accidentally permitted Goolebot-Images to crawl the unwanted directories. The onslaught has now ceased at last.
|
lucy24

msg:4335909 | 5:58 pm on Jul 6, 2011 (gmt 0) |
I should think so, unless SetEnvIf has rules all its own. (I use the special form BrowserMatch but it's the same thing.) You can put in an override saying <Files robots.txt> Order Allow,Deny Allow from all </Files> so they have no excuse. Interesting to know that the Imagebot pays its own separate visits to robots.txt. I thought the robots.txtbot did the work for everyone. ! Important There is a special rule for the regular googlebot which may also apply to the imagebot. The moment the googlebot is mentioned by name in your robots.txt, it looks only at those lines that use its name. That means that if you have any google-specific rules but you also want it to follow the general rules, you need to say everything twice.
|
cyberdyne

msg:4335926 | 6:23 pm on Jul 6, 2011 (gmt 0) |
Thank you Lucy. Regarding your last paragraph, would be good enough to give me an example please (I'm a few cards short of a deck!).
|
phranque

msg:4337258 | 12:38 am on Jul 9, 2011 (gmt 0) |
User-agent: * Disallow: /gtfo
User-agent: Googlebot-Image/1.0 Disallow: /images Disallow: /gtfo
|
cyberdyne

msg:4337328 | 8:47 am on Jul 9, 2011 (gmt 0) |
OK I see what you mean now, many thanks (both).
|
|