homepage Welcome to WebmasterWorld Guest from 54.227.41.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 33 message thread spans 2 pages: 33 ( [1] 2 > >     
Bad activity from Yahoo ranges
bots and/or cloud?
dstiles




msg:4150574
 3:49 pm on Jun 10, 2010 (gmt 0)

My server received about 30 hits earlier today across several sites from two yahoo ranges that are new to me, although they were registered in 1999 and 2007. All hits seemed to be home page.

The general ranges are:
63.250.192.0 - 63.250.223.255
98.136.0.0 - 98.139.255.255

Activity was in the ranges below but may be wider:
63.250.192.0 - 63.250.193.255
98.137.120.0 - 98.137.129.255

Typical rDNS (all seemed to be ygrid):
gsbl31825.blue.ygrid.yahoo.com

As far as I can tell the UAs were all:
Mozilla

The activity looks odd for either dsl or yahoo bots: I often get yahoo bots with no UA at all but this looks more like a cloud bot.

Are these ranges (or any part of them) a new cloud?

I have currently blocked the secondary ranges mentioned above.

 

wilderness




msg:4150679
 5:59 pm on Jun 10, 2010 (gmt 0)

dstiles,
63.250.192.0
Yahoo! Broadcast Services, Inc.

This is a specific tool, which use presently escapes me.

I've the following reference from 2004.

206.190.43.101 - - [20/Jul/2004:19:26:32 +0100] "GET /sounds/filename.mp3 HTTP/1.1" 200 4701571 "-" "Yahoo-MMAudVid/1.0 (mms dash mmaudvidcrawler dash support at yahoo dash inc dot com)"

dstiles




msg:4150836
 10:30 pm on Jun 10, 2010 (gmt 0)

Yes, I saw "broadcast services" and wondered what relevance that had to hitting my server. :(

It's unlike yahoo to completely hide their crawler identities (I think the no UA hits are probably cloaking detectors but have no proof). If they can't be open about it, they're closed.

jdMorgan




msg:4150855
 11:29 pm on Jun 10, 2010 (gmt 0)

I've been blocking that "just Mozilla, nothing more" user-agent.

I've received several requests with that UA this month, mostly from Beijing province, but two from the higher Yahoo range you mentioned, which resolves to gsblNNNNN.blue.ygrid.yahoo.com:


123.125.68.nn (:48382) Tue Jun 1 16:35:04 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.68.nn+6 (:49262) Wed Jun 2 19:04:24 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.68.nn+6 (:51340) Thu Jun 3 01:37:12 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.68.nn+1 (:63691) Sat Jun 5 09:04:01 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.68.nn+6 (:64042) Sun Jun 6 04:05:31 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.69.nn+1 (:64960) Mon Jun 7 13:05:56 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

123.125.69.nn (:2467) Wed Jun 9 11:35:35 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

98.137.122.xx (:44890) Thu Jun 10 04:25:07 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Accept="*/*" Reason denied="Unknown-invalid-unwelcome-or-spoofed-UA"

123.125.69.nn+1 (:6296) Thu Jun 10 06:36:10 2010 "GET (m.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"

98.137.127.xx+130 (:44656) Thu Jun 10 07:46:58 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Accept="*/*" Reason denied="Unknown-invalid-unwelcome-or-spoofed-UA"

123.125.69.nn+1 (:7628) Thu Jun 10 11:34:45 2010 "GET (www.example.org:80) /" "-" "Mozilla"
Connection="close" Accept="*/*" Accept-Encoding="gzip" Reason denied="Deny-from"


The UAs are the same but the Connection and Accept-Encoding headers are missing in the requests from the Yahoo range, and so far, there have been no fetch attempts for the mobile site from the Yahoo range (note "www" and "m" subdomains above). So I certainly can't infer that it's the same agent or the same people behind that agent.

No robots.txt fetch attempts from that "Mozilla" UA, at least not this month.

Also currently blocking a ton of fake Googlebots as well... Wrong IP ranges and other "incorrectnesses."

Jim

Pfui




msg:4150990
 4:52 am on Jun 11, 2010 (gmt 0)

Hit late last night (Pacific). Like Jim, I already block 'plain' Mozilla:

gsbl30203.blue.ygrid.yahoo.com
Mozilla

robots.txt? NO

caribguy




msg:4151006
 6:13 am on Jun 11, 2010 (gmt 0)

Seen the above too. Single hit on homepage.

This is getting ridiculous though:
67.195.111.41 - - [10/Jun/2010:00:00:005 -0000] "GET /folder/file HTTP/1.0" 301 290 "-" "Python-urllib/1.15"

NetRange: 67.195.0.0 - 67.195.255.255
CIDR: 67.195.0.0/16
NetName: A-YAHOO-US8

Started with single hits to the example.com homepage on June 2nd and 7th, followed by a handful to www.example.com content pages on the 8th and 10th

dstiles




msg:4151459
 10:11 pm on Jun 11, 2010 (gmt 0)

Jim, I also see a lot of Apnic-based Mozilla-only hits along with Mozilla/4.0 and Mozilla/5.0 -only UAs. Also a LOT of fake googles, from both DSL and servers.

Caribguy - haven't had any bad UAs from the bot range 67.195.110.0 - 67.195.115.255 (well, mostly bots). Can't say I've ever seen a python or similar from yahoo bot IPs at all.

caribguy




msg:4151472
 10:52 pm on Jun 11, 2010 (gmt 0)

Also saw:

67.195.50.nnn "Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; hxxp://developer.yahoo.com/searchmonkey/useragent)"

and a bunch from

67.195.112.nnn 67.195.115.nnn
"Mozilla/5.0 (YahooYSMcm/3.0.0; hxxp://help.yahoo.com)"

Not using any Yahoo! Search Marketing products, see previous thread here: [webmasterworld.com...]

keyplyr




msg:4151614
 9:23 am on Jun 12, 2010 (gmt 0)

I also block "Mozilla"

Checking just today, I see only one (403) hit from a Yahoo range w/ Mozilla:

gsbl31153.blue.ygrid.yahoo.com
98.137.126.221

dstiles




msg:4151783
 9:13 pm on Jun 12, 2010 (gmt 0)

Keyplr - one hit on how many web sites? Mine set seemed to be one per site on a virtual server.

keyplyr




msg:4151796
 10:16 pm on Jun 12, 2010 (gmt 0)

@ dstiles

Yes, one hit on one site yesterday.

And one HEAD request on one site today:

204.180.153.**
ns1-auth.sprintlink.net

dstiles




msg:4152060
 9:32 pm on Jun 13, 2010 (gmt 0)

Looks like the 403s are stopping them - or they only want to see the home page anyway.

204.180.153.nn is sonicwall. They sell protection software but this range could be proxies:

Pfui




msg:4168912
 2:14 am on Jul 13, 2010 (gmt 0)

FWIW...

From the timing and the fact all hits were to the exact same file, Yahoo's joined the post-Tweet swarm. All requests but one were HEADS; no robots.txt for either 'session.' Sure is inbred out there.

llf531077.crawl.yahoo.net [rIP: 72.30.142.249]
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
07/12 17:45:23
07/12 17:47:23
07/12 17:48:01
07/12 17:48:28
07/12 17:48:58
07/12 17:50:19
07/12 17:55:58

llf320064.crawl.yahoo.net [rIP: 67.195.37.177]
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
07/12 17:40:52
07/12 17:45:22
07/12 17:47:58
07/12 17:54:25

Umbra




msg:4169171
 12:15 pm on Jul 13, 2010 (gmt 0)

RE: the 98.137. range, I see activity from 98.137.64.62 as Yahoo! Slurp ... it kept hitting a single page (not the home page) for a couple of months and then stopped. Then, from the same IP address, one request for favicon.ico as "YahooCacheSystem" [webmasterworld.com...]

dstiles




msg:4169515
 9:12 pm on Jul 13, 2010 (gmt 0)

Yahoo is by far the most prolific bot and always has been as far as I recall. In the first 9 days of this month it exceeded twice the total of google and msn put together.

If bing is going to feed yahoo search, do we really need yahoo slurp?

Pfui




msg:4169710
 5:03 am on Jul 14, 2010 (gmt 0)

Interesting stats, dstiles. Flip-side for me, for my biggest (half-mil docs; majors only okay to ~200 static), oldest (1997) specialty site. Yahoo's never been my top hitter, but when combined w/ MSN, they are now. Here's hoping their mashup will mean more efficient hits, more accurate indices, and more responsive webmaster tools.

----- Authorized Bots: July 1-13
3.33% Feedfetcher-Google; (~900 hits)
2.82% msnbot/2.0b
2.27% Yahoo! Slurp-3.0 spoofing as Mozilla/5.0
1.77% Googlebot-2.1 spoofing as Mozilla/5.0
0.43% Yahoo! Slurp spoofing as Mozilla/5.0
0.36% msnbot-media/1.1
0.02% Googlebot-Image
0.00% msnbot/1.1
0.00% Microsoft Bing Mobile SocialStreams Bot
-----

----- MSN/Yahoo: July 1-13
2.82% msnbot/2.0b
2.27% Yahoo! Slurp-3.0 spoofing as Mozilla/5.0
0.43% Yahoo! Slurp spoofing as Mozilla/5.0 (a.k.a. Slurp China*)
0.36% msnbot-media/1.1
0.00% msnbot/1.1
0.00% Microsoft Bing Mobile SocialStreams Bot
----- Total: 5.88%

----- Google: July 1-13
3.33% Feedfetcher-Google;
1.77% Googlebot-2.1 spoofing as Mozilla/5.0
0.02% Googlebot-Image
----- Total: 5.12%

Notes:
- 0.00% = <10 hits
- Slurp China's been 403'd for years for no-robots.txt. Still comes a'callin'.
- UAs not exact; e.g., "spoofing as" are stats program descriptions.

dstiles




msg:4170219
 9:45 pm on Jul 14, 2010 (gmt 0)

Google at its height was still less than yahoo, as was msn.

Strange reports from your stats analyzer! :)

I also block chinese bots, including yahoo. Feedgetcher has little fodder on my sites and doesn't appear much. I block media and image bots for the most part as not being relevant.

Pfui




msg:4213619
 6:14 am on Oct 8, 2010 (gmt 0)

Whatever Yahoo's doing cloaking with .ygrid.yahoo.com, they're still doing it. This just in. Again:

gsbl31698.blue.ygrid.yahoo.com
Mozilla

robots.txt? NO

Pfui




msg:4217973
 4:51 pm on Oct 17, 2010 (gmt 0)

Last discussed here in 2006 -- [webmasterworld.com...] (wonder what happened to Yahoo_Mike?) -- the following bizarre oddity's still nosing around:

morgue2.corp.yahoo.com
Mozilla/4.05 [en]

robots.txt? NO

Mokita




msg:4218112
 3:20 am on Oct 18, 2010 (gmt 0)

Pfui, I see the morgue2.corp.yahoo.com bot visiting too. But it only seems to be asking for URLs that are long dead (and buried) but which have links to them on other current sites.

So I guess that is why it has morgue in the name - it is a dead link checker.

It might use a cached copy of robots.txt retrieved by the normal Yahoo bot - anyway I have never caught the bot violating it.

The frustrating part is not being able to get the dead links removed or changed - so many sites are either no longer maintained, or their contact address is dead, or they fail to respond to a request, or there is no way to get in touch.

Pfui




msg:4227818
 7:07 pm on Nov 7, 2010 (gmt 0)

1.) Up until now, multiple yahoo.net servers using a "YahooCacheSystem" UA have only requested favicons. See also Umbra's post above: [webmasterworld.com...] E.g.:

ec2.ycs.s2e.yahoo.net
YahooCacheSystem
/favicon.ico
robots.txt? NO

2.) A bit ago, YahooCacheSystem went straight for a file:

ycar1.mobile.re3.yahoo.com
YahooCacheSystem
/dir/filename.html
robots.txt? NO

3.) A few weeks ago, similarly-named servers with yet another Yahoo UA went for the exact same file, plus others:

ycar11.mobile.re3.yahoo.com
ycar2.mobile.re3.yahoo.com
ycar3.mobile.re3.yahoo.com
(using)
YahooMobile/1.0 (Resource; Server; 1.0.0)
robots.txt? NO

At one point, YahooMobile was linked to acceleration [webmasterworld.com...] Perhaps "YahooCacheSystem" is, too? Anyway...

4.) Per my robots.txt, neither YahooCacheSystem nor YahooMobile are allowed to read anything but robots.txt. And neither of them ever read it -- or heed it. Ditto:

Mozilla/5.0 (compatible; Yahoo! Slurp China; [misc.yahoo.com.cn...]

That one's ignored robots.txt for years.

keyplyr




msg:4230028
 7:49 pm on Nov 13, 2010 (gmt 0)

I'm also seeing YahooCacheSystem request not only favicon but now HTML files.

69.147.114.106 - - [12/Nov/2010:06:34:11 -0700] "GET www.example.com/page.html HTTP/1.1" 200 6934 "-" "YahooCacheSystem"

ycar10.mobile.re3.yahoo.com

MxAngel




msg:4242276
 1:51 pm on Dec 14, 2010 (gmt 0)

I had them for the first time in my logs today.

Host: 110.75.172.109
/
Http Code: 200 Date: Dec 14 05:41:44 Http Version: HTTP/1.1 Size in Bytes: 56260
Referer: -
Agent: Yahoo! Slurp China

I block all Chinese spiders so my intention was to apply it to this one too.

When looking up the IP I noticed that they are located in a Chinese Cloud. I already block any bot from the Amazon Cloud so I wont certainly allow a Chinese Cloud access to my website.

[linkedin.com...]

From: [en.wikipedia.org...]

China Yahoo
In October 2005, Alibaba Group formed a strategic partnership with Yahoo! Inc and acquired China Yahoo! (www.yahoo.com.cn), which is a Chinese portal offering search, email, and an enhanced focus on entertainment content.

# Alibaba Cloud Computing
deny from 110.75.160.0/19

dstiles




msg:4307866
 3:38 pm on May 4, 2011 (gmt 0)

Getting a lot of hits again on ygrid including http_load UA, all at 98.137.104.nn (142 hits in under two months, all blocked). I have now extended the block to 98.137.64/19.

Also just found about 30 blocked IPs in the range 98.137.72.215 to 254 (yst.yahoo) with a bot UA but wrong rDNS (nnnnnnn.yst.yahoo.net).

g1smd




msg:4322304
 8:51 pm on Jun 5, 2011 (gmt 0)

Recently seen a flurry of activity from
69.147.115.nnn IPs over a period of several days.

GET / HTTP/1.1

YahooCacheSystem

Fed it "410 Gone" and after a couple more visits it went away.

69.147.114.nnn has also been previously used according to other reports.
dstiles




msg:4322309
 9:22 pm on Jun 5, 2011 (gmt 0)

I have some of that range blocked as unwanted cache bots and some allowed as mobile proxies. Of the latter, I've blocked two IPs in the past few days for "illegal" activity, They are now showing as yahoocache as well and I've blocked a few more in that band.

Not sure why yahoo are using a cache unless for mobiles via their proxies but enough is enough.

I'm also not sure why yahoo are still sending bots round - they are still the top "legit" bot access. Presumably future-proofing themselves.

MxAngel




msg:4328565
 2:06 am on Jun 21, 2011 (gmt 0)

IP: 72.30.161.219
canonical name llf531027.crawl.yahoo.net

Robots.txt: NO

UA: Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

They got blocked for header 'Accept' missing

Did anyone notice bad stuff / bots coming from those ranges?

MxAngel




msg:4328568
 2:09 am on Jun 21, 2011 (gmt 0)

IP: 67.195.111.23

Robots.txt: NO

UA: Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

Same issue, header 'Accept' missing

dstiles




msg:4332765
 10:05 pm on Jun 29, 2011 (gmt 0)

Beginning last November I'm seeing a lot of bot UAs coming from the range 98.137.72.215 - 98.137.72.254 - almost every IP in the range, in fact. A lot of the activity has been in the past month or so.

Typical rDNS is b5101125.yst.yahoo.net (all seem to be yst) so it should not be a bot.

Usual UA for yahoo crawl is...
Yahoo! Slurp; [help.yahoo.com...]

UA for "bad" IP range is...
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; [help.yahoo.com...]

My guess is it's a new version or research bot. There is no mention of it in the help page (which needs yahoo's apps URL enabled for JS to permit viewing - naff!)

I'm leaving it blocked for now: like MSN it's using an incorrect rDNS so the hell with it.

Pfui




msg:4369626
 12:33 am on Oct 2, 2011 (gmt 0)

Slurp/3.0 [webmasterworld.com...] is not the only Yahoo bot (still) gunning for what it shouldn't:

ycar11.mobile.bf1.yahoo.com
YahooCacheSystem

10/01 13:n7:30 /
10/01 14:n9:50 /

robots.txt? NO (...and fully Disallowed therein)

IP: 98.139.241.250 [projecthoneypot.org...]

This 33 message thread spans 2 pages: 33 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved