homepage Welcome to WebmasterWorld Guest from 54.205.189.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Yahoo! Slurp/3.0 released on new IPs
Headaches for those Validating by IP instead of Reverse DNS
Receptional Andy




msg:3626869
 11:23 pm on Apr 14, 2008 (gmt 0)

The Yahoo search blog has announced the release of an updated version of their crawler: Slurp 3.0 [ysearchblog.com], which is now live with the new user-agent and different IP ranges (after some teething troubles).

There isn't a great deal of (any) detail on the technical differences between the spider versions, unfortunately. Anyone know any more about it?

 

incrediBILL




msg:3626889
 11:48 pm on Apr 14, 2008 (gmt 0)

The big news is that they claim they're invalidating all the old Slurp IP addresses so anyone that validates by IP instead of reverse DNS-based identification [ysearchblog.com] of Slurp is about to be in a world of hurt until the new IPs are known.

They claim that Slurp 3.0 will recognize the old Slurp information which means the robots.txt file should be OK but those of you that do very narrow rewrite rules might need to update. Additionally, reverse dns validation of crawl.yahoo.net domain will continue to function properly for the new smaller set of IPs.

Many sites will start bouncing Slurp! that didn't heed the call to use rDNS validation for major SEs so this will be ugly.

wilderness




msg:3626913
 12:28 am on Apr 15, 2008 (gmt 0)

enough is enough!

Per their own NEW press release.

#SetEnvIf User-Agent "Slurp/3.0" keep_out
SetEnvIf User-Agent "Slurp/1.0" keep_out
SetEnvIf User-Agent "Slurp/2.0" keep_out
SetEnvIf User-Agent "slurp@inktomi.com" keep_out
SetEnvIf User-Agent "Yahoo! Slurp;" keep_out

In addition I have some very old references to the following (have no idea when they were last used):

Slurp/cat
Slurp/si

Nor have I kept updates on the follwing which are contained in my robots.txt:

Yahoo-MMCrawler
YahooSeeker
Yahoo! Mindset
Yahoo-Blogs
Yahoo-MMAudVid
YahooFeedSeeker
YahooSeeker-Testing
YahooSeeker/CafeKelsa-dev
YahooVideoSearch
YahooYSMcm
Yahoo! DE Slurp
Yahoo! Slurp China

blend27




msg:3627246
 11:19 am on Apr 15, 2008 (gmt 0)

So is this the Range 67.195.0.0/16?

if so, this is whats comming to my sites from that range.

Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
Mozilla/5.0 (compatible; Yahoo! DE Slurp; [help.yahoo.com...]
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...]

Slurp/3.0 was first seen around 2007-11-19 and got caught in the trap, trying...

incrediBILL




msg:3627583
 6:18 pm on Apr 15, 2008 (gmt 0)

So is this the Range 67.195.0.0/16?

According to their blog post:
The crawlers will start crawling from a different and much smaller set of IP addresses, but it'll still be from the crawl.yahoo.net domain.

So I'm not sure if that means they're switching to a completely new set of IPs or just dropping a large segment of their existing IPs, but it does say "different" so the jury is still out on what that means until we can verify it.

Receptional Andy




msg:3627752
 10:57 pm on Apr 15, 2008 (gmt 0)

From what I've seen, Slurp is still crawling with the new and old UAs. They do seem to be using different approaches, but I haven't figured out what the intention is yet. It's funny watching them both grab pages at the same time, though ;)

Still, these are our sites they're unleashing themselves on. Would be nice to be told what to expect, eh?

wilderness




msg:3627759
 11:06 pm on Apr 15, 2008 (gmt 0)

Still, these are our sites they're unleashing themselves on. Would be nice to be told what to expect, eh?

They could care less of what webmasters desire. At least those few that are aware of their activity.

The bots and their Dr. Frankenstien's have simply grown accustomed to crawling as they please with as many different number of bots simultaneously.

Unfortuantely, even if every participant here banded together in a joint denial it wouldn't slow down the crawling of the bots in amy manner, nor, even make them blink and wonder. . .

jmccormac




msg:3627774
 11:37 pm on Apr 15, 2008 (gmt 0)

These guys are the new Microsoft. On one of my sites, they are the second biggest spider by volume each month. In March they downloaded approximately 37K pages and were responsible for about 647 referrals. I am strongly considering banning them.

Regards...jmcc

incrediBILL




msg:3627835
 2:51 am on Apr 16, 2008 (gmt 0)

So far today Slurp/3.0 only crawled 46 pages out of 21K total Slurped pages today.

Googlebot came in 3rd with only got 8K pages and msnbot claimed 2nd with 11K pages, making Slurp the biggest crawler and it's been this way for many weeks now.

The bot that crawls the least sends the most traffic, the irony.

volatilegx




msg:3627840
 3:01 am on Apr 16, 2008 (gmt 0)

Thanks for the info. I've seen Slurp coming in from the following new Class Cs today:

67.195.37
67.195.51
67.195.52
67.195.54
67.195.98

As well as a number of Slurp visits from older ranges.

Ocean10000




msg:3627846
 3:15 am on Apr 16, 2008 (gmt 0)

Since everyone looking for a list of IP's. The following is a list of IP's in which I have seen slurp/3.0 coming from since this last November.
66.228.165.147
67.195.37.105
67.195.37.111
67.195.37.112
67.195.37.172
67.195.37.97
67.195.50.87
74.6.13.110
74.6.13.125
74.6.17.152
74.6.18.105
74.6.18.118
74.6.18.209
74.6.22.102
74.6.22.105
74.6.22.108
74.6.22.135
74.6.22.137
74.6.22.140
74.6.22.143
74.6.22.144
74.6.22.145
74.6.22.146
74.6.22.156
74.6.22.160
74.6.22.165
74.6.22.168
74.6.22.172
74.6.22.179
74.6.22.190
74.6.27.218
74.6.27.224
74.6.8.102
74.6.8.103
74.6.8.104
74.6.8.106
74.6.8.107
74.6.8.113
74.6.8.114
74.6.8.117
74.6.8.119
74.6.8.120
74.6.8.121
74.6.8.124
74.6.8.74
74.6.8.75
74.6.8.76
74.6.8.77
74.6.8.78
74.6.8.79
74.6.8.81

Ocean10000




msg:3627858
 3:29 am on Apr 16, 2008 (gmt 0)

I have noticed a thing our two about the newest Slurp/3.0 vs the other versions. (1) Is the new version downloads all the style sheets over again, but does supply the referrer for each style sheet (2) It comes though a proxy server, which I see in the "Via" Header supplied.

I am still digging for more details and doing comparisons to see if there is anything else worth noting here.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved