Putting this in robots.txt should at least slow it down:
Thanks for that.
But what does 240 stand for ?
240 will be the number of seconds that Slurp crawler will be expected to wait between requests.
Thanks majestic and Samizdata.
But is there a way to tell yahoo hey come back after a month ?
You might want to look into sitemaps - I think they allow something like this.
Can't say how much I hate that bot, keep meaning to block it, but always hung out on well you never know.
Just done a search of the log files for a small site.
Number of requests 34,000+ number of visitors 139.
I give in, slurp your banned.
Unfortunately Yahoo are pretty useless (though not as useless as Microsoft).
Yahoo sends a lot of separate bots from different IPs, occasionally violates robots.txt instructions, and insists on requesting the index of any directory even when there are no links to it and the index option has been turned off on the server.
You might try increasing the number of seconds in the crawl delay to 2400 or whatever.
You can also ban Slurp China specifically if you don't cater to the Asian market.
I would discourage banning Yahoo altogether because Google needs competition, however inept, and because Yahoo will send at least some traffic - and all human visitors should be valued.
Webmasters have that choice, though.
Two more points I would add to this discussion:
Firstly, do not expect an instant response to any change in your robots.txt - the bots will be working from a cached version and may take a few days to update themselves.
My second point is theoretical and cannot be treated as proven.
Yahoo's SearchScan feature was recently introduced in partnership with anti-virus vendors McAfee. It rates "site safety" in a way that has some similarities with the notorious AVG LinkScanner.
SearchScan is related to McAfee SiteAdvisor, but your logs will never identify a hit from McAfee.
Shortly before Yahoo SearchScan was launched a new (actually revived) Slurp spider was also launched. There are so many Yahoo bots that conclusions are difficult, but McAfee has to get its data from somewhere, and that means fetching pages from your site.
Banning Slurp altogether may get your site flagged as "questionable" in the SERPs.
|Banning Slurp altogether may get your site flagged as "questionable" in the SERPs. |
Not laughing at you, just the fact that, I've been meaning to ban slurp for a couple of years now and finally have. Only thing that has stopped me banning it before was, whether in some strange way it affected Google. Shouldn't think so , but stranger things etc.
Got the little green tick from site advisor, so we shall see what happens.
slurp yer barred, get oot.
|Got the little green tick from site advisor, so we shall see what happens. |
None of these programs work by magic, and all have to inspect your files.
Some simply fetch the homepage occasionally or subscribe to a pooled database (or maintain one). Others are more aggressive and demand - with varying degrees of success - that all your files be available for regular inspection.
Would one or more of the Slurp robots working for McAfee LinkScanner be any surprise?
Webmasters reported excessive crawling when the last one launched.
You might start by banning specific Slurp instances to see what happens.
It would be less risky and might be a source of knowledge.
The idea is to allow all humans - even Yahoomans.
I'm sticking with the decision, full ban, red card, sin bin for all slurp.
The rest of the world can have the 139 users (30% looking for an image that has not been on the site for over three years).
If the sites not good enough for yahoo then yahoo is not good enough for me (hugs Google).
Shall dig this thread out in future months and blame everyone else but me ;)
what about using the meta tags :
Revist : 14 days ?
Apparently I have had hundreds of GB of traffic going to yahoo slurp on my site
I needed this to stop IMMEDIATELY....
I believe it is because slurp for some reason is downloading my FLV files - I have
no clue why it would need to do this but whatever.....
So I am testing some new code in my robots.txt file that hopefully should
eliminate this problem....
User-Agent: Yahoo! Slurp
So if you have a website with various media files, you may be able to tell the bot
to not download them.
Here are the resources I used to build these directives.