How can I allow robots to access my pages

Forum Moderators: goodroi

Message Too Old, No Replies

How can I allow robots to access my pages

inktomi is following links it should not

expat123

1:55 am on Jun 20, 2005 (gmt 0)

but prevent them from following specific link (a "complaint" link) on most of my pages?

The complaint link should only be used by real users.

In the "complaint" script used, I am searching for the words 'bot', 'slurp', inktomi' in HTTP_USER_AGENT, and then exiting the program before the complaint is registered..

However, I would like a way to prevent all bots from following these specific links.

At the same time, I want bots to crawl and index the content on those pages.

Clint

9:36 am on Jun 20, 2005 (gmt 0)

I had to know something similar recently. Each bot has their own specific name. If you want to block ALL bots:

User-agent: *
Disallow: /

If you want to block ALL from certain pages ONLY:

User-agent: *
Disallow: /FoldernameOrPagename/WhateverIfAny
Disallow: /FoldernameOrPagename2/WhateverIfAny2

If you want to block Yahoo or Google only, you use the same syntax above but replace the * with for example Slurp for Yahoo or Googlebot for Google, and be sure to put this below it after a skipped line (G example):

User-agent: Googlebot
Disallow: /FoldernameOrPagename/WhateverIfAny

User-agent: *
Disallow:

To allow ALL to ALL pages, you just put this by itself:

User-agent: *
Disallow:

Then you can also use this in the <head> tag of each page you want to block to BLOCK ALL bots:

Replace "Robots" with "googlebot" if you want to only block Google, and I assume that would also work with "Slurp" for Yahoo. Remove both of the "no" in the tag above to ALLOW, or remove the first one to index but not follow links, or remove the second one to not index the page but to follow links. (I'd make it all lower case). I believe the robots.txt file method is the preferred method.

There's more info here for Google and I guess you can do the same with Yahoo for example here if you replace "googlebot" with "slurp".
[google.com...]

Reid

5:02 pm on Jun 20, 2005 (gmt 0)

you cannot use robots.txt to disallow a link on your pages but you can disallow the URL that the link points to.

Clint

10:08 am on Jun 21, 2005 (gmt 0)

Ho boy, I went through all that and didn't even noticed he said "LINK" and not webpage! (My brain is FRIED from Google.....and now, MSN!)