Forum Moderators: open

Message Too Old, No Replies

Yahoo's cloaked crawler(s)

crawler*.dls.srch.kr3.yahoo.com

         

Pfui

4:46 pm on Jan 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Korea-related? Dunno. But ALL cloaked, ALL no-robots.txt, and ALL using:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648)

crawler2.dls.srch.kr3.yahoo.com
06:37:27

crawler11.dls.srch.kr3.yahoo.com
06:37:27

crawler7.dls.srch.kr3.yahoo.com
06:39:29

crawler10.dls.srch.kr3.yahoo.com
06:39:29
06:41:31

Pfui

11:13 pm on Jan 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's a variation on the Yahoo cloaked-crawler theme: Emphasis mine:

ec2-174-129-120-104.compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) (for IE)

robots.txt? NO
URI: /favicon.ico
Related: amazonaws.com plays host to wide variety of bad bots [webmasterworld.com]

FYI:

1.) Search results for the entire string, above, show a whopping four entries before this posting, three of which reside on developer.yahoo.com and relate to Yahoo's shopping API, and the fourth is a blog entry about Yahoo's shopping web services/merchant search API.

2.) As far as the above UA goes, apparently someone used Yahoo's EXAMPLE string for faking IE (again emphasis mine) --

When calling the shopping APIs, you must set the HTTP user agent to a valid web browser string. Bot and spider strings are not valid. The user agent can be set to some default value - it does not have to be changed based on the user's browser. Some examples are as follows:
[/i[i]
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) (for IE)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1 (for FireFox)

Sources: TERMS OF SERVICE
[developer.yahoo.com...]
[developer.yahoo.com...]

FWIW:

My main concern is not about a semi-clueless cloud-based bot-runner copy-paste mess-up fake-out. Rather, it's Yahoo telling people to bypass no-robots kinds of requirements on OUR servers -- which is, of course, something we're not supposed to do on THEIR servers:

6. MEMBER CONDUCT
You agree to not use the Yahoo! Services to: ...

"j. interfere with or disrupt the Yahoo! Services or servers or networks connected to the Yahoo! Services, or disobey any requirements, procedures, policies or regulations of networks connected to the Yahoo! Services, including using any device, software or routine to bypass our robot exclusion headers;"

Source: Yahoo! Terms of Service
[info.yahoo.com...]

/mini-rant :)

Pfui

11:30 pm on Jan 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



P.S. While I was typing the above: Clueless came 'round again today, ~4 hours later:

ec2-174-129-120-104.compute-1.amazonaws.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) (for IE)

robots.txt? NO
URI: /favicon.ico

I won't keep posting "(for IE)" activity but it'll be interesting to know if/when you see this fake, too.