Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: goodroi
For the past few weeks, since I added Adsense actually, the Yahoo robots have being going mad! Yet I'm still not getting placed very well in yahoo search engine!
Sessions with tag Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...] - 4452
Is this normal? How can I monopolise on this?
We have all spiders banned on our .ca version yet Yahoo and MSN still index the site.
They claim adherance to this protocal yet in reality they don't.
Oh - as for slurp going crazy, that's just the way slurp is. It's always the most active spider on our site. Just the way they do it. Waste of everyone's bandwidth, but what can you do?
Adsense is a G thing, with its own crawlers (Mediapartners-Google*; Mediapartners-Google/2.1), and its presence or absence shouldn't affect Yahoo's crawlers.
belfastboy and marketingmagic --
1.) On my sites, all of Yahoo's crawlers respect robots.txt except one (which I then block via mod_rewrite). Here's the bad one:
Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]
2.) The following robots.txt instructions work for the rest of Yahoo's crawlers (of which there are a LOT). There is some question as to the upper- and lower-case names of a number of them so I include multiple versions just in case.
These instructions are excerpted from my robots.txt file, and include three parts: reference notes to myself at the top (the # means they're to be ignored by crawlers), all disallowed Yahoo-related crawlers in the middle, and then "Slurp" -- the only one I allow and only then with very specific instructions.
Again, I find the following effectively shuts out all of Y!'s crawlers (except for "China" mentioned above) and also successfully controls "Slurp":
# Slurp: [help.yahoo.com...]
# HOST: .inktomisearch.com; .mail.mud.yahoo.com
# Slurp CHINA: [misc.yahoo.com.cn...]
# Slurp DE: [help.yahoo.com...]
# Blogs: [help.yahoo.com...]
# MM: mms dash mmcrawler dash support at yahoo dash inc dot com
# Y!J-BSC/1.0 (http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html)
# Yahoo! Mindset (http://mindset.research.yahoo.com/)
User-agent: Yahoo! Mindset
User-agent: Y!J-BSC/1.0 (http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html)
User-agent: y!j-bsc/1.0 (http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html)
User-agent: Y!J/1.0 (http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html)
User-agent: y!j/1.0 (http://help.yahoo.co.jp/help/jp/search/indexing/indexing-15.html)
User-agent: Mozilla/4.0 (compatible; Y!J; for robot study; keyoshid)
User-agent: Mozilla/4.0 (compatible; y!j; for robot study; keyoshid)
User-agent: Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]
User-agent: Mozilla/5.0 (compatible; Yahoo! DE Slurp; [help.yahoo.com...]
I've also found that no one obeys crawl delays, even when their info pages say they do. I include them anyway -- hope springs eternal:)