Welcome to WebmasterWorld Guest from 54.167.144.170

Forum Moderators: martinibuster

Slowing Down Yahoo Slurp

My site gets new content once a month but yahoo bot crawls everyday

   
7:16 am on Jun 29, 2008 (gmt 0)

5+ Year Member



Hi,
My site gets new content once a month but yahoo bot for no apparent reason crawls my site everyday and crawls most of my site.

How do i stop and save valuable bandwidth ?
If it is possible via robot what is the syntax

9:45 am on Jun 29, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Putting this in robots.txt should at least slow it down:

User-agent: Slurp
Crawl-delay: 240

...

11:56 am on Jun 29, 2008 (gmt 0)

5+ Year Member



Thanks for that.

But what does 240 stand for ?

11:58 am on Jun 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



240 will be the number of seconds that Slurp crawler will be expected to wait between requests.
12:40 pm on Jun 29, 2008 (gmt 0)

5+ Year Member



Thanks majestic and Samizdata.

But is there a way to tell yahoo hey come back after a month ?

1:38 pm on Jun 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to look into sitemaps - I think they allow something like this.
2:30 pm on Jun 29, 2008 (gmt 0)

5+ Year Member



Can't say how much I hate that bot, keep meaning to block it, but always hung out on well you never know.

Just done a search of the log files for a small site.
Number of requests 34,000+ number of visitors 139.

I give in, slurp your banned.

3:04 pm on Jun 29, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Unfortunately Yahoo are pretty useless (though not as useless as Microsoft).

Yahoo sends a lot of separate bots from different IPs, occasionally violates robots.txt instructions, and insists on requesting the index of any directory even when there are no links to it and the index option has been turned off on the server.

You might try increasing the number of seconds in the crawl delay to 2400 or whatever.

You can also ban Slurp China specifically if you don't cater to the Asian market.

I would discourage banning Yahoo altogether because Google needs competition, however inept, and because Yahoo will send at least some traffic - and all human visitors should be valued.

Webmasters have that choice, though.

...

3:38 pm on Jun 29, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Two more points I would add to this discussion:

Firstly, do not expect an instant response to any change in your robots.txt - the bots will be working from a cached version and may take a few days to update themselves.

My second point is theoretical and cannot be treated as proven.

Yahoo's SearchScan feature was recently introduced in partnership with anti-virus vendors McAfee. It rates "site safety" in a way that has some similarities with the notorious AVG LinkScanner.

SearchScan is related to McAfee SiteAdvisor, but your logs will never identify a hit from McAfee.

Shortly before Yahoo SearchScan was launched a new (actually revived) Slurp spider was also launched. There are so many Yahoo bots that conclusions are difficult, but McAfee has to get its data from somewhere, and that means fetching pages from your site.

Banning Slurp altogether may get your site flagged as "questionable" in the SERPs.

...

4:09 pm on Jun 29, 2008 (gmt 0)

5+ Year Member



lol
Banning Slurp altogether may get your site flagged as "questionable" in the SERPs.

Not laughing at you, just the fact that, I've been meaning to ban slurp for a couple of years now and finally have. Only thing that has stopped me banning it before was, whether in some strange way it affected Google. Shouldn't think so , but stranger things etc.

Got the little green tick from site advisor, so we shall see what happens.

slurp yer barred, get oot.

5:19 pm on Jun 29, 2008 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Got the little green tick from site advisor, so we shall see what happens.

None of these programs work by magic, and all have to inspect your files.

Some simply fetch the homepage occasionally or subscribe to a pooled database (or maintain one). Others are more aggressive and demand - with varying degrees of success - that all your files be available for regular inspection.

Would one or more of the Slurp robots working for McAfee LinkScanner be any surprise?

Webmasters reported excessive crawling when the last one launched.

You might start by banning specific Slurp instances to see what happens.

It would be less risky and might be a source of knowledge.

The idea is to allow all humans - even Yahoomans.

...

6:26 pm on Jun 29, 2008 (gmt 0)

5+ Year Member



Fair warning.
I'm sticking with the decision, full ban, red card, sin bin for all slurp.

The rest of the world can have the 139 users (30% looking for an image that has not been on the site for over three years).

If the sites not good enough for yahoo then yahoo is not good enough for me (hugs Google).

Shall dig this thread out in future months and blame everyone else but me ;)

11:57 am on Jul 4, 2008 (gmt 0)

5+ Year Member



what about using the meta tags :

Revist : 14 days ?

12:36 am on Jul 19, 2008 (gmt 0)

5+ Year Member



Apparently I have had hundreds of GB of traffic going to yahoo slurp on my site

I needed this to stop IMMEDIATELY....

I believe it is because slurp for some reason is downloading my FLV files - I have
no clue why it would need to do this but whatever.....

So I am testing some new code in my robots.txt file that hopefully should
eliminate this problem....

user-agent: *
Disallow: /flvplayer.swf

User-agent: Googlebot
Disallow: /*.flv$

User-Agent: Yahoo! Slurp
Disallow: /*.flv$

So if you have a website with various media files, you may be able to tell the bot
to not download them.

Here are the resources I used to build these directives.

[google.com...]

[ysearchblog.com...]

 

Featured Threads

Hot Threads This Week

Hot Threads This Month