homepage Welcome to WebmasterWorld Guest from 23.22.128.96
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Yahoo / Yahoo Search Engine and Directory
Forum Library, Charter, Moderators: martinibuster

Yahoo Search Engine and Directory Forum

    
Slowing Down Yahoo Slurp
My site gets new content once a month but yahoo bot crawls everyday
Atharva




msg:3686223
 7:16 am on Jun 29, 2008 (gmt 0)

Hi,
My site gets new content once a month but yahoo bot for no apparent reason crawls my site everyday and crawls most of my site.

How do i stop and save valuable bandwidth ?
If it is possible via robot what is the syntax

 

Samizdata




msg:3686256
 9:45 am on Jun 29, 2008 (gmt 0)

Putting this in robots.txt should at least slow it down:

User-agent: Slurp
Crawl-delay: 240

...

Atharva




msg:3686278
 11:56 am on Jun 29, 2008 (gmt 0)

Thanks for that.

But what does 240 stand for ?

Lord Majestic




msg:3686279
 11:58 am on Jun 29, 2008 (gmt 0)

240 will be the number of seconds that Slurp crawler will be expected to wait between requests.

Atharva




msg:3686295
 12:40 pm on Jun 29, 2008 (gmt 0)

Thanks majestic and Samizdata.

But is there a way to tell yahoo hey come back after a month ?

Lord Majestic




msg:3686323
 1:38 pm on Jun 29, 2008 (gmt 0)

You might want to look into sitemaps - I think they allow something like this.

appi2




msg:3686341
 2:30 pm on Jun 29, 2008 (gmt 0)

Can't say how much I hate that bot, keep meaning to block it, but always hung out on well you never know.

Just done a search of the log files for a small site.
Number of requests 34,000+ number of visitors 139.

I give in, slurp your banned.

Samizdata




msg:3686359
 3:04 pm on Jun 29, 2008 (gmt 0)

Unfortunately Yahoo are pretty useless (though not as useless as Microsoft).

Yahoo sends a lot of separate bots from different IPs, occasionally violates robots.txt instructions, and insists on requesting the index of any directory even when there are no links to it and the index option has been turned off on the server.

You might try increasing the number of seconds in the crawl delay to 2400 or whatever.

You can also ban Slurp China specifically if you don't cater to the Asian market.

I would discourage banning Yahoo altogether because Google needs competition, however inept, and because Yahoo will send at least some traffic - and all human visitors should be valued.

Webmasters have that choice, though.

...

Samizdata




msg:3686373
 3:38 pm on Jun 29, 2008 (gmt 0)

Two more points I would add to this discussion:

Firstly, do not expect an instant response to any change in your robots.txt - the bots will be working from a cached version and may take a few days to update themselves.

My second point is theoretical and cannot be treated as proven.

Yahoo's SearchScan feature was recently introduced in partnership with anti-virus vendors McAfee. It rates "site safety" in a way that has some similarities with the notorious AVG LinkScanner.

SearchScan is related to McAfee SiteAdvisor, but your logs will never identify a hit from McAfee.

Shortly before Yahoo SearchScan was launched a new (actually revived) Slurp spider was also launched. There are so many Yahoo bots that conclusions are difficult, but McAfee has to get its data from somewhere, and that means fetching pages from your site.

Banning Slurp altogether may get your site flagged as "questionable" in the SERPs.

...

appi2




msg:3686385
 4:09 pm on Jun 29, 2008 (gmt 0)

lol
Banning Slurp altogether may get your site flagged as "questionable" in the SERPs.

Not laughing at you, just the fact that, I've been meaning to ban slurp for a couple of years now and finally have. Only thing that has stopped me banning it before was, whether in some strange way it affected Google. Shouldn't think so , but stranger things etc.

Got the little green tick from site advisor, so we shall see what happens.

slurp yer barred, get oot.

Samizdata




msg:3686407
 5:19 pm on Jun 29, 2008 (gmt 0)

Got the little green tick from site advisor, so we shall see what happens.

None of these programs work by magic, and all have to inspect your files.

Some simply fetch the homepage occasionally or subscribe to a pooled database (or maintain one). Others are more aggressive and demand - with varying degrees of success - that all your files be available for regular inspection.

Would one or more of the Slurp robots working for McAfee LinkScanner be any surprise?

Webmasters reported excessive crawling when the last one launched.

You might start by banning specific Slurp instances to see what happens.

It would be less risky and might be a source of knowledge.

The idea is to allow all humans - even Yahoomans.

...

appi2




msg:3686444
 6:26 pm on Jun 29, 2008 (gmt 0)

Fair warning.
I'm sticking with the decision, full ban, red card, sin bin for all slurp.

The rest of the world can have the 139 users (30% looking for an image that has not been on the site for over three years).

If the sites not good enough for yahoo then yahoo is not good enough for me (hugs Google).

Shall dig this thread out in future months and blame everyone else but me ;)

malcolmcroucher




msg:3690395
 11:57 am on Jul 4, 2008 (gmt 0)

what about using the meta tags :

Revist : 14 days ?

stinkysloppy




msg:3702177
 12:36 am on Jul 19, 2008 (gmt 0)

Apparently I have had hundreds of GB of traffic going to yahoo slurp on my site

I needed this to stop IMMEDIATELY....

I believe it is because slurp for some reason is downloading my FLV files - I have
no clue why it would need to do this but whatever.....

So I am testing some new code in my robots.txt file that hopefully should
eliminate this problem....

user-agent: *
Disallow: /flvplayer.swf

User-agent: Googlebot
Disallow: /*.flv$

User-Agent: Yahoo! Slurp
Disallow: /*.flv$

So if you have a website with various media files, you may be able to tell the bot
to not download them.

Here are the resources I used to build these directives.

[google.com...]

[ysearchblog.com...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Yahoo / Yahoo Search Engine and Directory
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved