Welcome to WebmasterWorld Guest from 54.211.136.250

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Yahoo! Slurp/3.0 released on new IPs

Headaches for those Validating by IP instead of Reverse DNS

   
11:23 pm on Apr 14, 2008 (gmt 0)



The Yahoo search blog has announced the release of an updated version of their crawler: Slurp 3.0 [ysearchblog.com], which is now live with the new user-agent and different IP ranges (after some teething troubles).

There isn't a great deal of (any) detail on the technical differences between the spider versions, unfortunately. Anyone know any more about it?

11:48 pm on Apr 14, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The big news is that they claim they're invalidating all the old Slurp IP addresses so anyone that validates by IP instead of reverse DNS-based identification [ysearchblog.com] of Slurp is about to be in a world of hurt until the new IPs are known.

They claim that Slurp 3.0 will recognize the old Slurp information which means the robots.txt file should be OK but those of you that do very narrow rewrite rules might need to update. Additionally, reverse dns validation of crawl.yahoo.net domain will continue to function properly for the new smaller set of IPs.

Many sites will start bouncing Slurp! that didn't heed the call to use rDNS validation for major SEs so this will be ugly.

12:28 am on Apr 15, 2008 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



enough is enough!

Per their own NEW press release.

#SetEnvIf User-Agent "Slurp/3.0" keep_out
SetEnvIf User-Agent "Slurp/1.0" keep_out
SetEnvIf User-Agent "Slurp/2.0" keep_out
SetEnvIf User-Agent "slurp@inktomi.com" keep_out
SetEnvIf User-Agent "Yahoo! Slurp;" keep_out

In addition I have some very old references to the following (have no idea when they were last used):

Slurp/cat
Slurp/si

Nor have I kept updates on the follwing which are contained in my robots.txt:

Yahoo-MMCrawler
YahooSeeker
Yahoo! Mindset
Yahoo-Blogs
Yahoo-MMAudVid
YahooFeedSeeker
YahooSeeker-Testing
YahooSeeker/CafeKelsa-dev
YahooVideoSearch
YahooYSMcm
Yahoo! DE Slurp
Yahoo! Slurp China

11:19 am on Apr 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So is this the Range 67.195.0.0/16?

if so, this is whats comming to my sites from that range.

Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
Mozilla/5.0 (compatible; Yahoo! DE Slurp; [help.yahoo.com...]
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...]

Slurp/3.0 was first seen around 2007-11-19 and got caught in the trap, trying...

6:18 pm on Apr 15, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



So is this the Range 67.195.0.0/16?

According to their blog post:

The crawlers will start crawling from a different and much smaller set of IP addresses, but it'll still be from the crawl.yahoo.net domain.

So I'm not sure if that means they're switching to a completely new set of IPs or just dropping a large segment of their existing IPs, but it does say "different" so the jury is still out on what that means until we can verify it.

10:57 pm on Apr 15, 2008 (gmt 0)



From what I've seen, Slurp is still crawling with the new and old UAs. They do seem to be using different approaches, but I haven't figured out what the intention is yet. It's funny watching them both grab pages at the same time, though ;)

Still, these are our sites they're unleashing themselves on. Would be nice to be told what to expect, eh?

11:06 pm on Apr 15, 2008 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Still, these are our sites they're unleashing themselves on. Would be nice to be told what to expect, eh?

They could care less of what webmasters desire. At least those few that are aware of their activity.

The bots and their Dr. Frankenstien's have simply grown accustomed to crawling as they please with as many different number of bots simultaneously.

Unfortuantely, even if every participant here banded together in a joint denial it wouldn't slow down the crawling of the bots in amy manner, nor, even make them blink and wonder. . .

11:37 pm on Apr 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



These guys are the new Microsoft. On one of my sites, they are the second biggest spider by volume each month. In March they downloaded approximately 37K pages and were responsible for about 647 referrals. I am strongly considering banning them.

Regards...jmcc

2:51 am on Apr 16, 2008 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



So far today Slurp/3.0 only crawled 46 pages out of 21K total Slurped pages today.

Googlebot came in 3rd with only got 8K pages and msnbot claimed 2nd with 11K pages, making Slurp the biggest crawler and it's been this way for many weeks now.

The bot that crawls the least sends the most traffic, the irony.

3:01 am on Apr 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the info. I've seen Slurp coming in from the following new Class Cs today:

67.195.37
67.195.51
67.195.52
67.195.54
67.195.98

As well as a number of Slurp visits from older ranges.

3:15 am on Apr 16, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Since everyone looking for a list of IP's. The following is a list of IP's in which I have seen slurp/3.0 coming from since this last November.
66.228.165.147
67.195.37.105
67.195.37.111
67.195.37.112
67.195.37.172
67.195.37.97
67.195.50.87
74.6.13.110
74.6.13.125
74.6.17.152
74.6.18.105
74.6.18.118
74.6.18.209
74.6.22.102
74.6.22.105
74.6.22.108
74.6.22.135
74.6.22.137
74.6.22.140
74.6.22.143
74.6.22.144
74.6.22.145
74.6.22.146
74.6.22.156
74.6.22.160
74.6.22.165
74.6.22.168
74.6.22.172
74.6.22.179
74.6.22.190
74.6.27.218
74.6.27.224
74.6.8.102
74.6.8.103
74.6.8.104
74.6.8.106
74.6.8.107
74.6.8.113
74.6.8.114
74.6.8.117
74.6.8.119
74.6.8.120
74.6.8.121
74.6.8.124
74.6.8.74
74.6.8.75
74.6.8.76
74.6.8.77
74.6.8.78
74.6.8.79
74.6.8.81
3:29 am on Apr 16, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I have noticed a thing our two about the newest Slurp/3.0 vs the other versions. (1) Is the new version downloads all the style sheets over again, but does supply the referrer for each style sheet (2) It comes though a proxy server, which I see in the "Via" Header supplied.

I am still digging for more details and doing comparisons to see if there is anything else worth noting here.