Forum Moderators: goodroi
the above is how we have our robots.txt set up to block yahoo from spidering, however yahoo is still spidering and placing pages in their index. We have another site with pages that could be seen as duplicates and want to avoid problems. What can we do differently to stop yahoo?
User-agent: Slurp
Disallow: /
See Also: Yahoo! Help / Search Help / Yahoo! Slurp - Yahoo!'s Web Crawler [help.yahoo.com]
I've never seen Yahoo dissobey robots.txt, so either there is a problem with your robots.txt file, or what you are seeing is a bot that is spoofing the Yahoo User Agent.
You can check this by
1) seeing if Yahoo is actually listing those pages
2) checking to see what IP address these request are coming from and doing and doing a whois lookup to find out who owns that ip. e.g. www.dnsstuff.com
You can also search Google using your search term(s) plus the following:
site:webmasterworld.com
You'll find robots.txt info galore here on WW in all of this forum's posts, and the official basics here [robotstxt.org], a.k.a. robots.txt.org.