incrediBILL

msg:3626889 | 11:48 pm on Apr 14, 2008 (gmt 0) |
The big news is that they claim they're invalidating all the old Slurp IP addresses so anyone that validates by IP instead of reverse DNS-based identification [ysearchblog.com] of Slurp is about to be in a world of hurt until the new IPs are known. They claim that Slurp 3.0 will recognize the old Slurp information which means the robots.txt file should be OK but those of you that do very narrow rewrite rules might need to update. Additionally, reverse dns validation of crawl.yahoo.net domain will continue to function properly for the new smaller set of IPs. Many sites will start bouncing Slurp! that didn't heed the call to use rDNS validation for major SEs so this will be ugly.
|
wilderness

msg:3626913 | 12:28 am on Apr 15, 2008 (gmt 0) |
enough is enough! Per their own NEW press release. #SetEnvIf User-Agent "Slurp/3.0" keep_out SetEnvIf User-Agent "Slurp/1.0" keep_out SetEnvIf User-Agent "Slurp/2.0" keep_out SetEnvIf User-Agent "slurp@inktomi.com" keep_out SetEnvIf User-Agent "Yahoo! Slurp;" keep_out In addition I have some very old references to the following (have no idea when they were last used): Slurp/cat Slurp/si Nor have I kept updates on the follwing which are contained in my robots.txt: Yahoo-MMCrawler YahooSeeker Yahoo! Mindset Yahoo-Blogs Yahoo-MMAudVid YahooFeedSeeker YahooSeeker-Testing YahooSeeker/CafeKelsa-dev YahooVideoSearch YahooYSMcm Yahoo! DE Slurp Yahoo! Slurp China
|
blend27

msg:3627246 | 11:19 am on Apr 15, 2008 (gmt 0) |
So is this the Range 67.195.0.0/16? if so, this is whats comming to my sites from that range. Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...] Mozilla/5.0 (compatible; Yahoo! DE Slurp; [help.yahoo.com...] Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...] Slurp/3.0 was first seen around 2007-11-19 and got caught in the trap, trying...
|
incrediBILL

msg:3627583 | 6:18 pm on Apr 15, 2008 (gmt 0) |
| So is this the Range 67.195.0.0/16? |
| According to their blog post: | The crawlers will start crawling from a different and much smaller set of IP addresses, but it'll still be from the crawl.yahoo.net domain. |
| So I'm not sure if that means they're switching to a completely new set of IPs or just dropping a large segment of their existing IPs, but it does say "different" so the jury is still out on what that means until we can verify it.
|
Receptional Andy

msg:3627752 | 10:57 pm on Apr 15, 2008 (gmt 0) |
From what I've seen, Slurp is still crawling with the new and old UAs. They do seem to be using different approaches, but I haven't figured out what the intention is yet. It's funny watching them both grab pages at the same time, though ;) Still, these are our sites they're unleashing themselves on. Would be nice to be told what to expect, eh?
|
wilderness

msg:3627759 | 11:06 pm on Apr 15, 2008 (gmt 0) |
| Still, these are our sites they're unleashing themselves on. Would be nice to be told what to expect, eh? |
| They could care less of what webmasters desire. At least those few that are aware of their activity. The bots and their Dr. Frankenstien's have simply grown accustomed to crawling as they please with as many different number of bots simultaneously. Unfortuantely, even if every participant here banded together in a joint denial it wouldn't slow down the crawling of the bots in amy manner, nor, even make them blink and wonder. . .
|
jmccormac

msg:3627774 | 11:37 pm on Apr 15, 2008 (gmt 0) |
These guys are the new Microsoft. On one of my sites, they are the second biggest spider by volume each month. In March they downloaded approximately 37K pages and were responsible for about 647 referrals. I am strongly considering banning them. Regards...jmcc
|
incrediBILL

msg:3627835 | 2:51 am on Apr 16, 2008 (gmt 0) |
So far today Slurp/3.0 only crawled 46 pages out of 21K total Slurped pages today. Googlebot came in 3rd with only got 8K pages and msnbot claimed 2nd with 11K pages, making Slurp the biggest crawler and it's been this way for many weeks now. The bot that crawls the least sends the most traffic, the irony.
|
volatilegx

msg:3627840 | 3:01 am on Apr 16, 2008 (gmt 0) |
Thanks for the info. I've seen Slurp coming in from the following new Class Cs today: 67.195.37 67.195.51 67.195.52 67.195.54 67.195.98 As well as a number of Slurp visits from older ranges.
|
Ocean10000

msg:3627846 | 3:15 am on Apr 16, 2008 (gmt 0) |
Since everyone looking for a list of IP's. The following is a list of IP's in which I have seen slurp/3.0 coming from since this last November. 66.228.165.147 67.195.37.105 67.195.37.111 67.195.37.112 67.195.37.172 67.195.37.97 67.195.50.87 74.6.13.110 74.6.13.125 74.6.17.152 74.6.18.105 74.6.18.118 74.6.18.209 74.6.22.102 74.6.22.105 74.6.22.108 74.6.22.135 74.6.22.137 74.6.22.140 74.6.22.143 74.6.22.144 74.6.22.145 74.6.22.146 74.6.22.156 74.6.22.160 74.6.22.165 74.6.22.168 74.6.22.172 74.6.22.179 74.6.22.190 74.6.27.218 74.6.27.224 74.6.8.102 74.6.8.103 74.6.8.104 74.6.8.106 74.6.8.107 74.6.8.113 74.6.8.114 74.6.8.117 74.6.8.119 74.6.8.120 74.6.8.121 74.6.8.124 74.6.8.74 74.6.8.75 74.6.8.76 74.6.8.77 74.6.8.78 74.6.8.79 74.6.8.81
|
Ocean10000

msg:3627858 | 3:29 am on Apr 16, 2008 (gmt 0) |
I have noticed a thing our two about the newest Slurp/3.0 vs the other versions. (1) Is the new version downloads all the style sheets over again, but does supply the referrer for each style sheet (2) It comes though a proxy server, which I see in the "Via" Header supplied. I am still digging for more details and doing comparisons to see if there is anything else worth noting here.
|
|