Forum Moderators: open
Yahoo sends a lot of separate bots from different IPs, occasionally violates robots.txt instructions, and insists on requesting the index of any directory even when there are no links to it and the index option has been turned off on the server.
You might try increasing the number of seconds in the crawl delay to 2400 or whatever.
You can also ban Slurp China specifically if you don't cater to the Asian market.
I would discourage banning Yahoo altogether because Google needs competition, however inept, and because Yahoo will send at least some traffic - and all human visitors should be valued.
Webmasters have that choice, though.
...
Firstly, do not expect an instant response to any change in your robots.txt - the bots will be working from a cached version and may take a few days to update themselves.
My second point is theoretical and cannot be treated as proven.
Yahoo's SearchScan feature was recently introduced in partnership with anti-virus vendors McAfee. It rates "site safety" in a way that has some similarities with the notorious AVG LinkScanner.
SearchScan is related to McAfee SiteAdvisor, but your logs will never identify a hit from McAfee.
Shortly before Yahoo SearchScan was launched a new (actually revived) Slurp spider was also launched. There are so many Yahoo bots that conclusions are difficult, but McAfee has to get its data from somewhere, and that means fetching pages from your site.
Banning Slurp altogether may get your site flagged as "questionable" in the SERPs.
...
Banning Slurp altogether may get your site flagged as "questionable" in the SERPs.
Not laughing at you, just the fact that, I've been meaning to ban slurp for a couple of years now and finally have. Only thing that has stopped me banning it before was, whether in some strange way it affected Google. Shouldn't think so , but stranger things etc.
Got the little green tick from site advisor, so we shall see what happens.
slurp yer barred, get oot.
Got the little green tick from site advisor, so we shall see what happens.
None of these programs work by magic, and all have to inspect your files.
Some simply fetch the homepage occasionally or subscribe to a pooled database (or maintain one). Others are more aggressive and demand - with varying degrees of success - that all your files be available for regular inspection.
Would one or more of the Slurp robots working for McAfee LinkScanner be any surprise?
Webmasters reported excessive crawling when the last one launched.
You might start by banning specific Slurp instances to see what happens.
It would be less risky and might be a source of knowledge.
The idea is to allow all humans - even Yahoomans.
...
The rest of the world can have the 139 users (30% looking for an image that has not been on the site for over three years).
If the sites not good enough for yahoo then yahoo is not good enough for me (hugs Google).
Shall dig this thread out in future months and blame everyone else but me ;)
I needed this to stop IMMEDIATELY....
I believe it is because slurp for some reason is downloading my FLV files - I have
no clue why it would need to do this but whatever.....
So I am testing some new code in my robots.txt file that hopefully should
eliminate this problem....
user-agent: *
Disallow: /flvplayer.swf
User-agent: Googlebot
Disallow: /*.flv$
User-Agent: Yahoo! Slurp
Disallow: /*.flv$
So if you have a website with various media files, you may be able to tell the bot
to not download them.
Here are the resources I used to build these directives.
[google.com...]
[ysearchblog.com...]