Forum Moderators: open

Message Too Old, No Replies

How long for Slurp to honor robots.txt

Wondering why Yahoo Slurip hasn't begun obeying robots exlucsions yet

         

jeffsmith

3:11 pm on May 15, 2007 (gmt 0)

10+ Year Member



We updated our robots.txt file a couple weeks ago. Google began honoring the new exclusions within a couple days. Yahoo Slurp, on the other hand, is still crawling URLs we don't want it to. Any thoughts on when we can expect it to start obeying our new rules?

BillyS

1:14 am on May 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've heard this before and it always seems strange to me because Slurp has alway obeyed robots.txt on my site.

BillyS

1:15 am on May 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And welcome to WebmasterWorld!

jeffsmith

2:58 pm on May 16, 2007 (gmt 0)

10+ Year Member



Thanks for the welcome.

After speaking to several Yahoo folks at SESNY and attending the robots.txt summit there, I was pretty enthused about Yahoo's general appearance of responsiveness and interest in webmaster concerns.

That was the people, of course, and not the robot, so perhaps I was hasty in transferring such enthusiasm to the bot. Patience is in order, I suppose.

walkman

11:18 pm on May 16, 2007 (gmt 0)



within hours in my experience.

BillyS

1:28 am on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are you sure Slurp is indexing the pages and not just crawling them? There could be links that Slurp is following but not indexing because of robots.txt

You might also want to make sure your robots.txt syntax is correct.

[edited by: BillyS at 1:29 am (utc) on May 17, 2007]

jeffsmith

3:04 am on May 17, 2007 (gmt 0)

10+ Year Member



Well, it's a little bit funky. ... I rechecked the logs and saw that Slurp appears to be honoring some of the exclusions but not all of them yet anyway. I don't understand why that would be the case.

Some of the URLs it's still crawling aren't in the index for various reasons, either they redirect to a legit URL or they're tagged with a meta noindex. But other URLs shouldn't be crawled because they're flat out disallowed. Perhaps it's a matter of time. I noticed it took Googlebot about a day to begin honoring our update.

As far as syntax, I used syntax based on examples from the Yahoo Search Blog and tested them on the Google webmaster robots.txt analysis tool. So the syntax should be OK.