Forum Moderators: open

Message Too Old, No Replies

Yahoo Slurp Ignoring Robots File?

anyone else noticing this?

         

WiseWebDude

9:38 pm on Nov 8, 2007 (gmt 0)

10+ Year Member



I have been browsing Yahoo's Site Explorer and noticed many, many pages that are blocked by the robots.txt file for years and here they are showing cached by Yahoo. Anyone else seeing a lot of this as well? I mean it looks as if they aren't even looking at the robots.txt file anymore?

martinibuster

4:38 pm on Nov 12, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Do snippets from those pages show up in Yahoo's SERPs?

WiseWebDude

4:58 pm on Nov 12, 2007 (gmt 0)

10+ Year Member



Yes, that is what is weird. Like they are just ignoring it, period. Never has before and we've had in for YEARS and all is right in the robots.txt file for sure. I wonder if anyone else is seeing this?

jeffsmith

7:23 pm on Nov 13, 2007 (gmt 0)

10+ Year Member



I see something similar happening. Basically I see Slurp crawling URLs that are blocked in the robots.txt and I find the pages listed in my Site Explorer account, but I don't necessarily see them in SERPs.

I am having trouble determining whether Yahoo is factoring all of those blocked pages that it shows in Site Explorer as part of its assessment of the site, or if it's just noise to me since I have to see these hugely inflated numbers of pages listed for my site (8 million+ for one).

This may be a separate issue, but I've seen some really old URLs from one of my sites that 301 redirect to new URLs. Yahoo's keeping the old URL listed in Site Explorer (the redirect's been in place for a couple years), but it shows a crawl date of Last crawled: 1970/01/01 00:00:00 GMT. How would you interpret that?