Forum Moderators: phranque

Message Too Old, No Replies

allow google-image?

robots.txt allow/disallow

         

keyplyr

4:45 am on Sep 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




I wish google-image to have access to directory4 and directory5 but other bots to follow the below rules. How do I do this? Thanks

User-agent: *
Disallow: directory1/
Disallow: directory2/
Disallow: directory3/
Disallow: directory4/
Disallow: directory5/
Disallow: directory6/

DaveAtIFG

5:11 am on Sep 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try this [searchengineworld.com] and use the validator [searchengineworld.com] to confirm your work.

keyplyr

5:15 am on Sep 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That page does not address the question I asked

DaveAtIFG

5:23 am on Sep 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree that it doesn't provide a specific example. Follow the links if necessary, to write your code.

closed

12:37 pm on Sep 8, 2003 (gmt 0)

10+ Year Member



I don't know if this would work, but since google-image scans HTML pages to find images, maybe you could try this: Put an HTML document in directory4 and directory5 that lists the images in that directory. Put a META tag that has noindex and nofollow for all the other bots, then have a separate line for google-image with the permissions you want. In your robots.txt file, you'll have to delete the Disallow lines for directory4 and directory5.

closed

1:39 pm on Sep 14, 2003 (gmt 0)

10+ Year Member



What was I thinking? This should work:

User-agent: *
Disallow: /directory1/
Disallow: /directory2/
Disallow: /directory3/
Disallow: /directory4/
Disallow: /directory5/
Disallow: /directory6/


User-agent: google-image
Disallow: /directory1/
Disallow: /directory2/
Disallow: /directory3/
Disallow: /directory6/

I added a forward slash to the beginning of each directory name.

I don't really know if the user-agent for google-image is right, but you get the idea. Basically, you just copy all the Disallow lines for the other robots, paste them to the more specific case for google-image, then remove the Disallows for /directory4/ and /directory5/ to allow access.

jdMorgan

7:11 pm on Sep 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You'll need to reverse the records in that code. Otherwise, ALL spiders will accept the User-agent: * record, and stop looking any further. Put the Googlebot-image record first; Googlebot-Image will accept that one, and the other spiders will ignore it, and then accept the User-agent: * record.

Jim

keyplyr

7:57 pm on Sep 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks - what's up with the preceeding slash? In 5 years, I've never used it.

closed

8:34 pm on Sep 14, 2003 (gmt 0)

10+ Year Member



You'll need to reverse the records in that code. Otherwise, ALL spiders will accept the User-agent: * record, and stop looking any further.

I want to agree with you on that one, Jim, but I can't. If you check here [robotstxt.org], you'll see that User-agent: * comes first. It works both ways. I also know that that works because I exclude robots that way.

Thanks - what's up with the preceeding slash? In 5 years, I've never used it.

My guess is that what you had before would disallow access to any directory whose name appeared in the Disallow. The slash is there to make sure that you disallow access only to files in /directoryN/ and not to files in, say, /differentDirectory/directoryN/. It's okay to use it without the slash, I guess, but if you aren't careful, you may end up disallowing access to more files than you really want.

keyplyr

8:59 pm on Sep 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




...okay to use it without the slash... but... you may end up disallowing access to more files than you really want.

OK, I understand your point. I never considered this since I don't nest directories. Thanks.

closed

9:09 pm on Sep 14, 2003 (gmt 0)

10+ Year Member



You're welcome, keyplyr.

keyplyr

9:28 pm on Sep 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just for the record, I checked and the UA is Googlebot-Image