Forum Moderators: goodroi
The complaint link should only be used by real users.
In the "complaint" script used, I am searching for the words 'bot', 'slurp', inktomi' in HTTP_USER_AGENT, and then exiting the program before the complaint is registered..
However, I would like a way to prevent all bots from following these specific links.
At the same time, I want bots to crawl and index the content on those pages.
User-agent: *
Disallow: /
If you want to block ALL from certain pages ONLY:
User-agent: *
Disallow: /FoldernameOrPagename/WhateverIfAny
Disallow: /FoldernameOrPagename2/WhateverIfAny2
If you want to block Yahoo or Google only, you use the same syntax above but replace the * with for example Slurp for Yahoo or Googlebot for Google, and be sure to put this below it after a skipped line (G example):
User-agent: Googlebot
Disallow: /FoldernameOrPagename/WhateverIfAny
User-agent: *
Disallow:
To allow ALL to ALL pages, you just put this by itself:
User-agent: *
Disallow:
Then you can also use this in the <head> tag of each page you want to block to BLOCK ALL bots:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
Replace "Robots" with "googlebot" if you want to only block Google, and I assume that would also work with "Slurp" for Yahoo. Remove both of the "no" in the tag above to ALLOW, or remove the first one to index but not follow links, or remove the second one to not index the page but to follow links. (I'd make it all lower case). I believe the robots.txt file method is the preferred method.
There's more info here for Google and I guess you can do the same with Yahoo for example here if you replace "googlebot" with "slurp".
[google.com...]