Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- Yahoo! Slurp


Pfui - 7:03 pm on Sep 11, 2011 (gmt 0)


1.) Mokita: Good thought about checking robots.txt. Mine's checks out A-OK. It's also CGI-generated so I know exactly what rules Yahoo/Slurp gets.

2.) Ironically, even "Yahoo! Slurp China" --

lj910568.crawl.yahoo.net
lj910607.crawl.yahoo.net
(etc.)

Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)

-- requests robots.txt, although it always ignores it. (So it gets 403'd for every file other than robots.txt.)

And then there's the Yahoo UA that never requests robots.txt, only favicon.co --

ycar10.mobile.bf1.yahoo.com
YahooCacheSystem

3.) Overall...

I get maybe 10 Yahoo-referred hits a month, and most are to two 'answers.yahoo.com' replies with links, not SERPs per se. Thus there's precious little benefit I can see in allowing anything other 'plain' "Yahoo! Slurp" from ".crawl.yahoo.com" access to anything other than .html files. YMMV


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4360952.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com