yahoo/slurp spidering

Forum Moderators: goodroi

Message Too Old, No Replies

yahoo/slurp spidering

We don't want these pages indexed by yahoo

latimer

8:51 pm on Jan 9, 2006 (gmt 0)

User-agent: slurp
Disallow: /

the above is how we have our robots.txt set up to block yahoo from spidering, however yahoo is still spidering and placing pages in their index. We have another site with pages that could be seen as duplicates and want to avoid problems. What can we do differently to stop yahoo?

Pfui

9:06 pm on Jan 9, 2006 (gmt 0)

It's case-sensitive:

User-agent: Slurp
Disallow: /

See Also: Yahoo! Help / Search Help / Yahoo! Slurp - Yahoo!'s Web Crawler [help.yahoo.com]

latimer

9:37 pm on Jan 9, 2006 (gmt 0)

thanks Pfui!

agradation

10:43 pm on Jan 22, 2006 (gmt 0)

Did it help?

I have specifically restricted ALL robots from visiting certain pages on my site like:

User-agent: *
Disallow: /rtv/ktv-l.htm

but Yahoo Slurp still comes and requests those pages. I think Yahoo does not obey the robot rules.

Dijkgraaf

3:22 am on Jan 23, 2006 (gmt 0)

Have you run your robots.txt through a robots.txt validator?
See the link at the top of this forum.

I've never seen Yahoo dissobey robots.txt, so either there is a problem with your robots.txt file, or what you are seeing is a bot that is spoofing the Yahoo User Agent.
You can check this by
1) seeing if Yahoo is actually listing those pages
2) checking to see what IP address these request are coming from and doing and doing a whois lookup to find out who owns that ip. e.g. www.dnsstuff.com

bostons4u

3:25 am on Jan 23, 2006 (gmt 0)

What is the total code to keep all search engines out of certain pages?

What is the code to get them in certain pages?

I am sorry I know this has been asked before and I did look for a search on this site to try to find this info.

Pfui

3:57 am on Jan 23, 2006 (gmt 0)

agradation, the URL I provided above includes all the of Yahoo's Slurp-related info. Go there and click the link that says...

"How do I prevent my site from being crawled or prevent certain subdirectories from being crawled?"

...then try including the Slurp-specific instructions in your robots.txt.

Pfui

4:04 am on Jan 23, 2006 (gmt 0)

bostons4u, you can search this site using Google, MSN, etc. To find code to include on every page, click the "site search" link atop every page.

You can also search Google using your search term(s) plus the following:

site:webmasterworld.com

You'll find robots.txt info galore here on WW in all of this forum's posts, and the official basics here [robotstxt.org], a.k.a. robots.txt.org.

agradation

8:28 am on Jan 23, 2006 (gmt 0)

Thanx Pfui, I'll try to make a special 'User-agent: Slurp' section for Yahoo in the robots.txt file and see what happens.
But I am a bit sceptical though.