Welcome to WebmasterWorld Guest from 23.20.137.66

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Bad activity from Yahoo ranges

bots and/or cloud?

     
3:49 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



My server received about 30 hits earlier today across several sites from two yahoo ranges that are new to me, although they were registered in 1999 and 2007. All hits seemed to be home page.

The general ranges are:
63.250.192.0 - 63.250.223.255
98.136.0.0 - 98.139.255.255

Activity was in the ranges below but may be wider:
63.250.192.0 - 63.250.193.255
98.137.120.0 - 98.137.129.255

Typical rDNS (all seemed to be ygrid):
gsbl31825.blue.ygrid.yahoo.com

As far as I can tell the UAs were all:
Mozilla

The activity looks odd for either dsl or yahoo bots: I often get yahoo bots with no UA at all but this looks more like a cloud bot.

Are these ranges (or any part of them) a new cloud?

I have currently blocked the secondary ranges mentioned above.
5:59 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



dstiles,
63.250.192.0
Yahoo! Broadcast Services, Inc.

This is a specific tool, which use presently escapes me.

I've the following reference from 2004.

206.190.43.101 - - [20/Jul/2004:19:26:32 +0100] "GET /sounds/filename.mp3 HTTP/1.1" 200 4701571 "-" "Yahoo-MMAudVid/1.0 (mms dash mmaudvidcrawler dash support at yahoo dash inc dot com)"
10:30 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Yes, I saw "broadcast services" and wondered what relevance that had to hitting my server. :(

It's unlike yahoo to completely hide their crawler identities (I think the no UA hits are probably cloaking detectors but have no proof). If they can't be open about it, they're closed.
11:29 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I've been blocking that "just Mozilla, nothing more" user-agent.

I've received several requests with that UA this month, mostly from Beijing province, but two from the higher Yahoo range you mentioned, which resolves to gsblNNNNN.blue.ygrid.yahoo.com:


123.125.68.nn (:48382) Tue Jun 1 16:35:04 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.68.nn+6 (:49262) Wed Jun 2 19:04:24 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.68.nn+6 (:51340) Thu Jun 3 01:37:12 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.68.nn+1 (:63691) Sat Jun 5 09:04:01 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.68.nn+6 (:64042) Sun Jun 6 04:05:31 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.69.nn+1 (:64960) Mon Jun 7 13:05:56 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.69.nn (:2467) Wed Jun 9 11:35:35 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

98.137.122.xx (:44890) Thu Jun 10 04:25:07 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Accept="*/*" Reason denied="Unknown-invalid-unwelcome-or-spoofed-UA"

123.125.69.nn+1 (:6296) Thu Jun 10 06:36:10 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

98.137.127.xx+130 (:44656) Thu Jun 10 07:46:58 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Accept="*/*" Reason denied="Unknown-invalid-unwelcome-or-spoofed-UA"

123.125.69.nn+1 (:7628) Thu Jun 10 11:34:45 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"


The UAs are the same but the Connection and Accept-Encoding headers are missing in the requests from the Yahoo range, and so far, there have been no fetch attempts for the mobile site from the Yahoo range (note "www" and "m" subdomains above). So I certainly can't infer that it's the same agent or the same people behind that agent.

No robots.txt fetch attempts from that "Mozilla" UA, at least not this month.

Also currently blocking a ton of fake Googlebots as well... Wrong IP ranges and other "incorrectnesses."

Jim
4:52 am on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Hit late last night (Pacific). Like Jim, I already block 'plain' Mozilla:

gsbl30203.blue.ygrid.yahoo.com
Mozilla

robots.txt? NO
6:13 am on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Seen the above too. Single hit on homepage.

This is getting ridiculous though:
67.195.111.41 - - [10/Jun/2010:00:00:005 -0000] "GET /folder/file HTTP/1.0" 301 290 "-" "Python-urllib/1.15"

NetRange: 67.195.0.0 - 67.195.255.255
CIDR: 67.195.0.0/16
NetName: A-YAHOO-US8

Started with single hits to the example.com homepage on June 2nd and 7th, followed by a handful to www.example.com content pages on the 8th and 10th
10:11 pm on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Jim, I also see a lot of Apnic-based Mozilla-only hits along with Mozilla/4.0 and Mozilla/5.0 -only UAs. Also a LOT of fake googles, from both DSL and servers.

Caribguy - haven't had any bad UAs from the bot range 67.195.110.0 - 67.195.115.255 (well, mostly bots). Can't say I've ever seen a python or similar from yahoo bot IPs at all.
10:52 pm on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Also saw:

67.195.50.nnn "Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; hxxp://developer.yahoo.com/searchmonkey/useragent)"

and a bunch from

67.195.112.nnn 67.195.115.nnn
"Mozilla/5.0 (YahooYSMcm/3.0.0; hxxp://help.yahoo.com)"

Not using any Yahoo! Search Marketing products, see previous thread here: [webmasterworld.com...]
9:23 am on Jun 12, 2010 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I also block "Mozilla"

Checking just today, I see only one (403) hit from a Yahoo range w/ Mozilla:

gsbl31153.blue.ygrid.yahoo.com
98.137.126.221
9:13 pm on Jun 12, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Keyplr - one hit on how many web sites? Mine set seemed to be one per site on a virtual server.
10:16 pm on Jun 12, 2010 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



@ dstiles

Yes, one hit on one site yesterday.

And one HEAD request on one site today:

204.180.153.**
ns1-auth.sprintlink.net
9:32 pm on Jun 13, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Looks like the 403s are stopping them - or they only want to see the home page anyway.

204.180.153.nn is sonicwall. They sell protection software but this range could be proxies:
2:14 am on Jul 13, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



FWIW...

From the timing and the fact all hits were to the exact same file, Yahoo's joined the post-Tweet swarm. All requests but one were HEADS; no robots.txt for either 'session.' Sure is inbred out there.

llf531077.crawl.yahoo.net [rIP: 72.30.142.249]
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
07/12 17:45:23
07/12 17:47:23
07/12 17:48:01
07/12 17:48:28
07/12 17:48:58
07/12 17:50:19
07/12 17:55:58

llf320064.crawl.yahoo.net [rIP: 67.195.37.177]
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
07/12 17:40:52
07/12 17:45:22
07/12 17:47:58
07/12 17:54:25
12:15 pm on Jul 13, 2010 (gmt 0)

10+ Year Member



RE: the 98.137. range, I see activity from 98.137.64.62 as Yahoo! Slurp ... it kept hitting a single page (not the home page) for a couple of months and then stopped. Then, from the same IP address, one request for favicon.ico as "YahooCacheSystem" [webmasterworld.com...]
9:12 pm on Jul 13, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Yahoo is by far the most prolific bot and always has been as far as I recall. In the first 9 days of this month it exceeded twice the total of google and msn put together.

If bing is going to feed yahoo search, do we really need yahoo slurp?
5:03 am on Jul 14, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Interesting stats, dstiles. Flip-side for me, for my biggest (half-mil docs; majors only okay to ~200 static), oldest (1997) specialty site. Yahoo's never been my top hitter, but when combined w/ MSN, they are now. Here's hoping their mashup will mean more efficient hits, more accurate indices, and more responsive webmaster tools.

----- Authorized Bots: July 1-13
3.33% Feedfetcher-Google; (~900 hits)
2.82% msnbot/2.0b
2.27% Yahoo! Slurp-3.0 spoofing as Mozilla/5.0
1.77% Googlebot-2.1 spoofing as Mozilla/5.0
0.43% Yahoo! Slurp spoofing as Mozilla/5.0
0.36% msnbot-media/1.1
0.02% Googlebot-Image
0.00% msnbot/1.1
0.00% Microsoft Bing Mobile SocialStreams Bot
-----

----- MSN/Yahoo: July 1-13
2.82% msnbot/2.0b
2.27% Yahoo! Slurp-3.0 spoofing as Mozilla/5.0
0.43% Yahoo! Slurp spoofing as Mozilla/5.0 (a.k.a. Slurp China*)
0.36% msnbot-media/1.1
0.00% msnbot/1.1
0.00% Microsoft Bing Mobile SocialStreams Bot
----- Total: 5.88%

----- Google: July 1-13
3.33% Feedfetcher-Google;
1.77% Googlebot-2.1 spoofing as Mozilla/5.0
0.02% Googlebot-Image
----- Total: 5.12%

Notes:
- 0.00% = <10 hits
- Slurp China's been 403'd for years for no-robots.txt. Still comes a'callin'.
- UAs not exact; e.g., "spoofing as" are stats program descriptions.
9:45 pm on Jul 14, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Google at its height was still less than yahoo, as was msn.

Strange reports from your stats analyzer! :)

I also block chinese bots, including yahoo. Feedgetcher has little fodder on my sites and doesn't appear much. I block media and image bots for the most part as not being relevant.
6:14 am on Oct 8, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Whatever Yahoo's doing cloaking with .ygrid.yahoo.com, they're still doing it. This just in. Again:

gsbl31698.blue.ygrid.yahoo.com
Mozilla

robots.txt? NO
4:51 pm on Oct 17, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Last discussed here in 2006 -- [webmasterworld.com...] (wonder what happened to Yahoo_Mike?) -- the following bizarre oddity's still nosing around:

morgue2.corp.yahoo.com
Mozilla/4.05 [en]

robots.txt? NO
3:20 am on Oct 18, 2010 (gmt 0)

5+ Year Member



Pfui, I see the morgue2.corp.yahoo.com bot visiting too. But it only seems to be asking for URLs that are long dead (and buried) but which have links to them on other current sites.

So I guess that is why it has morgue in the name - it is a dead link checker.

It might use a cached copy of robots.txt retrieved by the normal Yahoo bot - anyway I have never caught the bot violating it.

The frustrating part is not being able to get the dead links removed or changed - so many sites are either no longer maintained, or their contact address is dead, or they fail to respond to a request, or there is no way to get in touch.
7:07 pm on Nov 7, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



1.) Up until now, multiple yahoo.net servers using a "YahooCacheSystem" UA have only requested favicons. See also Umbra's post above: [webmasterworld.com...] E.g.:

ec2.ycs.s2e.yahoo.net
YahooCacheSystem
/favicon.ico
robots.txt? NO

2.) A bit ago, YahooCacheSystem went straight for a file:

ycar1.mobile.re3.yahoo.com
YahooCacheSystem
/dir/filename.html
robots.txt? NO

3.) A few weeks ago, similarly-named servers with yet another Yahoo UA went for the exact same file, plus others:

ycar11.mobile.re3.yahoo.com
ycar2.mobile.re3.yahoo.com
ycar3.mobile.re3.yahoo.com
(using)
YahooMobile/1.0 (Resource; Server; 1.0.0)
robots.txt? NO

At one point, YahooMobile was linked to acceleration [webmasterworld.com...] Perhaps "YahooCacheSystem" is, too? Anyway...

4.) Per my robots.txt, neither YahooCacheSystem nor YahooMobile are allowed to read anything but robots.txt. And neither of them ever read it -- or heed it. Ditto:

Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]

That one's ignored robots.txt for years.
7:49 pm on Nov 13, 2010 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I'm also seeing YahooCacheSystem request not only favicon but now HTML files.

69.147.114.106 - - [12/Nov/2010:06:34:11 -0700] "GET www.example.com/page.html HTTP/1.1" 200 6934 "-" "YahooCacheSystem"

ycar10.mobile.re3.yahoo.com
1:51 pm on Dec 14, 2010 (gmt 0)



I had them for the first time in my logs today.

Host: 110.75.172.109
/
Http Code: 200 Date: Dec 14 05:41:44 Http Version: HTTP/1.1 Size in Bytes: 56260
Referer: -
Agent: Yahoo! Slurp China

I block all Chinese spiders so my intention was to apply it to this one too.

When looking up the IP I noticed that they are located in a Chinese Cloud. I already block any bot from the Amazon Cloud so I wont certainly allow a Chinese Cloud access to my website.

[linkedin.com...]

From: [en.wikipedia.org...]

China Yahoo
In October 2005, Alibaba Group formed a strategic partnership with Yahoo! Inc and acquired China Yahoo! (www.yahoo.com.cn), which is a Chinese portal offering search, email, and an enhanced focus on entertainment content.

# Alibaba Cloud Computing
deny from 110.75.160.0/19
3:38 pm on May 4, 2011 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Getting a lot of hits again on ygrid including http_load UA, all at 98.137.104.nn (142 hits in under two months, all blocked). I have now extended the block to 98.137.64/19.

Also just found about 30 blocked IPs in the range 98.137.72.215 to 254 (yst.yahoo) with a bot UA but wrong rDNS (nnnnnnn.yst.yahoo.net).
8:51 pm on Jun 5, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Recently seen a flurry of activity from
69.147.115.nnn
IPs over a period of several days.

GET / HTTP/1.1


YahooCacheSystem


Fed it "410 Gone" and after a couple more visits it went away.

69.147.114.nnn
has also been previously used according to other reports.
9:22 pm on Jun 5, 2011 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I have some of that range blocked as unwanted cache bots and some allowed as mobile proxies. Of the latter, I've blocked two IPs in the past few days for "illegal" activity, They are now showing as yahoocache as well and I've blocked a few more in that band.

Not sure why yahoo are using a cache unless for mobiles via their proxies but enough is enough.

I'm also not sure why yahoo are still sending bots round - they are still the top "legit" bot access. Presumably future-proofing themselves.
2:06 am on Jun 21, 2011 (gmt 0)



IP: 72.30.161.219
canonical name llf531027.crawl.yahoo.net

Robots.txt: NO

UA: Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

They got blocked for header 'Accept' missing

Did anyone notice bad stuff / bots coming from those ranges?
2:09 am on Jun 21, 2011 (gmt 0)



IP: 67.195.111.23

Robots.txt: NO

UA: Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

Same issue, header 'Accept' missing
10:05 pm on Jun 29, 2011 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Beginning last November I'm seeing a lot of bot UAs coming from the range 98.137.72.215 - 98.137.72.254 - almost every IP in the range, in fact. A lot of the activity has been in the past month or so.

Typical rDNS is b5101125.yst.yahoo.net (all seem to be yst) so it should not be a bot.

Usual UA for yahoo crawl is...
Yahoo! Slurp; [help.yahoo.com...]

UA for "bad" IP range is...
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...]

My guess is it's a new version or research bot. There is no mention of it in the help page (which needs yahoo's apps URL enabled for JS to permit viewing - naff!)

I'm leaving it blocked for now: like MSN it's using an incorrect rDNS so the hell with it.
12:33 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Slurp/3.0 [webmasterworld.com...] is not the only Yahoo bot (still) gunning for what it shouldn't:

ycar11.mobile.bf1.yahoo.com
YahooCacheSystem

10/01 13:n7:30 /
10/01 14:n9:50 /

robots.txt? NO (...and fully Disallowed therein)

IP: 98.139.241.250 [projecthoneypot.org...]
This 33 message thread spans 2 pages: 33
 

Featured Threads

Hot Threads This Week

Hot Threads This Month