homepage Welcome to WebmasterWorld Guest from 174.129.103.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 152 message thread spans 6 pages: < < 152 ( 1 [2] 3 4 5 6 > >     
MSN's many cloaked bots. Again.
Pfui




msg:4182832
 11:44 pm on Aug 5, 2010 (gmt 0)

Previously... [webmasterworld.com]

Currently, straight out of my logs...

65.52.33.73 - - [05/Aug/2010:15:45:09 -0700] "GET /dir/filename.html HTTP/1.1" 403 1468 "-" "-"

No UA, no robots.txt, no REF, no nothing. Not once. Not twice. Not even three times. Try eleven.

65.52.33.73
-
08/05 15:45:09/dir/filename.html
08/05 15:45:20/dir/filename.html
08/05 15:45:31/dir/filename.html
08/05 15:45:42/dir/filename.html
08/05 15:45:53/dir/filename.html
08/05 15:46:03/dir/filename.html
08/05 15:46:14/dir/filename.html
08/05 15:46:25/dir/filename.html
08/05 15:46:35/dir/filename.html
08/05 15:46:46/dir/filename.html
08/05 15:46:57/dir/filename.html

Same poor file. All hits 403'd because no UA; also because bare MSN IP and not a bona fide MSN bot.

 

Pfui




msg:4196232
 7:32 pm on Sep 3, 2010 (gmt 0)

I hate cloaking. Alas, most majors and former majors cloak like crazy. Here's a bunch of bot-spotter minutiae, in no particular order...

-----
GOOGLE
-----
The empire-builder uses bare IPs and no UAs to hit (& hit & hit) favicons for sites added to its Webmaster Tools pages (Home list, site Dashboards, etc.). For example --

72.14.192.68 - - [03/Sep/2010:10:11:39 -0700] "GET /favicon.ico HTTP/1.1" 200 5430 "-" "-"

Additional IPs used the exact same way:

72.14.192.68
72.14.212.81
72.14.212.82
72.14.212.85
72.14.212.87
74.125.154.81
74.125.154.85

All bare IPs, no UAs, no robots.txt, no REF, no nothing. (sighs) And don't get me started on G's Code and Labs creations, a la:

74.125.154.85
AppEngine-Google; (+http://code.google.com/appengine; appid: linksalpha)
robots.txt? NO

2010 [webmasterworld.com...]

-----
IBM
-----
Like the Energizer Bunny, .watson.ibm.com just keeps going, and going, and going... For what purpose? Beats me.

2010 [webmasterworld.com...] [webmasterworld.com...]
2009 [webmasterworld.com...]

-----
MSN
-----
This thread and its predecessor [webmasterworld.com...] aren't the only reports of MSN's cloaking:

2009 [webmasterworld.com...]

And here's a little oddity from last January: Microsoft's portal domain came a'crawlin':

gig4-2.tuk2f-gsr-a.us.msn.net
Microsoft MSN SocialStreams Bot
robots.txt? Yes

Hmm. I guess "Microsoft MSN SocialStreams Bot" is "Microsoft Bing Mobile SocialStreams Bot" now?

2010 [webmasterworld.com...]

-----
DOW JONES
-----
Multiple IPs/server farms... Again for what purpose? Dunno.

2009-2010 [webmasterworld.com...]

-----
YAHOO
-----
Too many years, too many probs. Just last month:

research-mm10.corp.sp1.yahoo.com
Firefox 4.0
robots.txt? NO

Just today, no UA:

ycar3.mobile.sp1.yahoo.com - - [02/Sep/2010:07:47:55 -0700] "GET / HTTP/1.1" 403 702 "-" "-"

Oh, and HEAD requests, too. Newly atypical for Slurp on my sites. And redundant: This file didn't change in 30 secs:

llf531077.crawl.yahoo.net - - [21/Jul/2010:17:08:38 -0700] "HEAD /dirA/filenameA.html [snip]"
llf531077.crawl.yahoo.net - - [21/Jul/2010:17:09:08 -0700] "HEAD /dirA/filenameA.html [snip]"

'dev' subdomains from .corp.yahoo.com historically problematic, too:

18ndev96.yst.corp.yahoo.com
sedev1039.yst.corp.yahoo.com

Re both of those:
Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
robots.txt? NO

But wait! There's more!

2009-2010 [webmasterworld.com...]

-----
BAIDU
-----
Here's a thread from 2009 and they're still at it. 'Nuff said: [webmasterworld.com...]

-----
Okay, that's enough. But that's not all... All of the above are but a miniscule fraction of hits from cloaked start-ups, wanna-bes, Twitter swarmers, student projects, semi-clueless individuals and denizens of the cesspool that is AmazonAWS [webmasterworld.com...]

Solution? I err on the restrictive side when it comes to anyone wasting my and/or my clients' bandwidth, even more so when it comes to crawling for unknown reasons. No read/heed robots.txt? 403

Thoughts?

incrediBILL




msg:4196421
 7:31 am on Sep 4, 2010 (gmt 0)

I hate cloaking


When there is no UA or some obvious garbage UA, it's not cloaking, it's the white elephant in the room, and it would never hit a server that whitelists bot access in the first place.

Bot Blocking 101: No shirt, No Shoes, No UA, No Service.

When it's a bot using a browser UA coming from a known bot location, now THAT's cloaking!

dstiles




msg:4196550
 4:56 pm on Sep 4, 2010 (gmt 0)

I'm seeing several hits during the past fortnight (approx 130, earliest 14th August from the range 65.55.25/24 with rDNS on the order of ns1.msft.net.(space here)msnhst.microsoft.com

All have the UA of:
msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

Needless to say, all blocked.

Pfui




msg:4196601
 6:57 pm on Sep 4, 2010 (gmt 0)

@incrediBILL: I guess I don't think allowing, say, googlebot from .googlebot.com or msnbot from .search.msn.com is a blanket invitation to all non-search spawn from the companies' bare (no rDNS) IPs.

@dstiles: A while back, that UA came from .search.msn.com. [webmasterworld.com...] It's still a stumper (or a tpyo:)

dstiles




msg:4196621
 8:03 pm on Sep 4, 2010 (gmt 0)

I remember the tail-end but couldn't recall if it was msn or google. :)

Possibly a (b for) beta? Doubt it, though.

Interesting aside: both msnhst.microsoft.com and msft.net (and also microsoft.com itself) are listed in spam databases, albeit mostly for being "rfc-ignorant". :)

dstiles




msg:4196662
 10:41 pm on Sep 4, 2010 (gmt 0)

Just seen the ._ bot on 65.55.3.141 - a genuine bot IP.

Pfui




msg:4200724
 8:33 am on Sep 12, 2010 (gmt 0)

And now, from Microsoft in the UK:

94.245.105.68
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune 4.0; MS-RTC LM 8; .NET4.0C; .NET4.0E; InfoPath.3)

robots.txt? NO

Apparently right out of the gate, that IP's plopped into 3 honey pots within a week using the same UA. [projecthoneypot.org...]

dstiles




msg:4200865
 7:01 pm on Sep 12, 2010 (gmt 0)

Popped up here 10th Sept and hit 15 times in the next two days.

It's another of those msft.net/msnhst.microsoft.com rDNS entries. Beginning to think that anything with that rDNS should be blocked.

keyplyr




msg:4200917
 10:37 pm on Sep 12, 2010 (gmt 0)

94.245.105.68 just hitting index page, no robots.txt.

Pfui




msg:4200998
 3:58 am on Sep 13, 2010 (gmt 0)

dstiles & keyplyr -- Did your hits use the same UA? If not, which one(s), please?

dstiles: We do rDNS on the server and 94.245.105.68 hit bare. Upthread, Dijkgraaf reported 94.245.108.194 as suspect, and a smattering of WHOIS checks indicates that similar Microsoft UK IPs are no-rDNS. What am I missing? Where's the "msft.net/msnhst.microsoft.com" connection, plz?

keyplyr




msg:4201003
 4:17 am on Sep 13, 2010 (gmt 0)

94.245.105.68
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune 4.0; MS-RTC LM 8; .NET4.0C; .NET4.0E; InfoPath.3)"

Pfui




msg:4201247
 5:12 pm on Sep 13, 2010 (gmt 0)

Yep. Exact same cloak. Thanks, keyplyr.

dstiles




msg:4201434
 11:25 pm on Sep 13, 2010 (gmt 0)

Pfui - same UA.

I'm getting the rDNS through my Linux/Ubuntu Network Tools lookup. Robtex and blacklistalert.org do not give any rDNS. I've just run a /24 scan of rDNS through my usual MS DNS server (which is in USA) and got absolutely nothing.

I'm guessing that the rDNS Network Tools is returning is through some higher source but have no idea how/why/what.

I began to trace the IP range through robtex but ran out of time.

Pfui




msg:4201447
 12:25 am on Sep 14, 2010 (gmt 0)

dstiles, thanks for the info. You know, when it comes to Things MSN, I tend to err on the 403 side if the IP's bare and the UA isn't search-specific. Alas, as of 30 minutes ago, it looks like I need to batten down still more hatches...

msnbot-207-46-12-27.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1)

robots.txt? NO
graphics? NO
.js? YES
.css? YES

What the --?

How silly of me to think I could 'trust' msnbot-[yada-yada].search.msn.com. No more.

dstiles




msg:4201795
 8:35 pm on Sep 14, 2010 (gmt 0)

Looking more closely, with brain turned on, I think what I'm seeing as rDNS is the SOA for the domains, not rDNS at all. Got the lookup results turned up too high. Sorry for the confusion. :(

Pfui




msg:4203105
 2:48 am on Sep 17, 2010 (gmt 0)

No prob:) But speaking of problems, it's déja vu all over again... Just like the OP but a slightly different IP and file. Alas, still:

No UA, no robots.txt, no REF, no nothing. Not once. Not twice. Not even three times. Try eleven.

65.52.32.195
-
09/16 19:09:35
09/16 19:09:45
09/16 19:09:56
09/16 19:10:07
09/16 19:10:18
09/16 19:10:29
09/16 19:10:39
09/16 19:10:50
09/16 19:11:01
09/16 19:11:11
09/16 19:11:22

Eleven hits to yet another file? Still no robots.txt? MS is more suspect, and sloppy, than every before.

Pfui




msg:4207303
 3:35 am on Sep 27, 2010 (gmt 0)

UA reported up-thread (message #4191289), where files requested appeared authentication-related. This time, just root:

msnbot-65-54-247-157.search.msn.com
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

robots.txt? NO

Pfui




msg:4210677
 9:02 am on Oct 3, 2010 (gmt 0)

Addendum re preceding: See also Now seeing Bingbot [webmasterworld.com...]
-----

Today: Mashup time with MSN Hosts, IPs and UAs. (Why? WHY?) All hits from .search.msn.com except for one bare IP. All hits by "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" except for one 'new' one in the herd...

msnbot-65-52-49-139.search.msn.com
msnbot/2.0b (+http://search.msn.com/msnbot.htm)
10/02 15:53:20 /robots.txt

msnbot-207-46-199-24.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
10/02 16:03:14 /

207.46.193.50
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
10/02 16:53:13 /filename.html

msnbot-207-46-12-202.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
10/02 17:37:53 /filename.html

msnbot-207-46-199-46.search.msn.com
10/02 18:05:40 /robots.txt

msnbot-207-46-12-70.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
10/02 18:06:18 /filename.html

msnbot-207-46-199-46.search.msn.com
10/02 18:10:59 /filename.html

msnbot-207-46-199-199.search.msn.com
10/02 19:14:28 /robots.txt

msnbot-207-46-204-231.search.msn.com
10/02 19:43:05 /robots.txt

Dijkgraaf




msg:4210899
 3:02 am on Oct 4, 2010 (gmt 0)

The stealth msnbot will also break at least one disallow rule in robots.txt.
I disallowed a JavaScript file. But it will fetch it anyway if it fetched a page that references it.

IP addresses
94.245.127.nnn
94.245.108.nnn
94.245.105.nnn
65.55.24.nnn
65.52.104.nnn
207.46.92.nnn
207.46.204.nnn
207.46.199.nnn
207.46.195.nnn
207.46.193.nnn
207.46.12.nnn

UA's seen
-
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Trident/4.0; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618; InfoPath.2; HYVES)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; MS-OC 4.0.7341.0508; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MS-RTC LM 8; InfoPath.3)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune 4.0; MS-RTC LM 8; .NET4.0C; .NET4.0E; InfoPath.3)
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8

Pfui




msg:4213589
 4:22 am on Oct 8, 2010 (gmt 0)

Not a new cloaked UA, but alas, still at it:

msnbot-207-46-12-212.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)

10/07 20:44:40 /dir/filename.html

robots.txt? NO

Pfui




msg:4218387
 6:30 pm on Oct 18, 2010 (gmt 0)

For more tales of simultaneous cloaking and crawling, see:

Now seeing Bingbot
[webmasterworld.com...]

caribguy




msg:4221479
 7:43 am on Oct 25, 2010 (gmt 0)

As discussed here [webmasterworld.com] and here [webmasterworld.com]

It just does not seem the garden variety bad bot:

Ip's: mostly 65.55.3.0/24 and sporadically 65.55.25.nnn or 207.68.164.nnn

msnbot-65-55-3-175.search.msn.com
no rDNS for 65.55.25.nnn
gig4-2.tuk2f-gsr-a.us.msn.net

Disregards robots.txt

Interesting attempt today:

URL 'http://www.example.com/access_logs'
AcceptLanguage {}
HTTP_ACCEPT '*/*'
CONNECTION_TYPE 'Keep-Alive'
HTTP_USER_AGENT 'msnbot/2.0b (+http://search.msn.com/msnbot.htm)._'
HTTP_FROM 'msnbotf(at)microsoft.com'
SERVER_PROTOCOL 'HTTP/1.1'

Are those two deliberate misspellings? And going for non existent 'access_logs' - hmmm...

jdMorgan




msg:4221846
 9:25 pm on Oct 25, 2010 (gmt 0)

I saw a similar request from 65.55.25.141 for just "/logs" very early today.

Two weeks ago, I saw several previous requests for the (valid) home-page URL from 65.55.3.170

All blocked though, since the rDNS for 65.55.25.141 is not a "msnbot" hostname, and if they can't spell the msnbot UA correctly or get the other headers right, they're not allowed in.

Jim

caribguy




msg:4221859
 9:45 pm on Oct 25, 2010 (gmt 0)

Yep, saw a request for robots.txt and subsequent /logs on a handful of domains just minutes ago.

In the past 18 hours, it's been hitting from:

65.55.3.136
65.55.3.168
65.55.3.170
65.55.3.175
65.55.3.177
65.55.3.198
65.55.25.150
65.55.55.206
65.55.55.214

Staffa




msg:4222466
 10:17 pm on Oct 26, 2010 (gmt 0)

Saw one today as well :

GET /access_logs/ - 80 - 65.55.25.138 HTTP/1.1 msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

The UA with the ._ at the end, they started that quite some time ago. Anyway they got a 404 for their effort ;)

Samizdata




msg:4222494
 11:29 pm on Oct 26, 2010 (gmt 0)

Consecutive requests for /logs/ and /access_logs/ this morning from msnbot.

Unreasonable and unwelcome behaviour.

...

Dijkgraaf




msg:4222498
 11:38 pm on Oct 26, 2010 (gmt 0)

Have you disallowed those in your robots.txt?

Pfui




msg:4222505
 12:00 am on Oct 27, 2010 (gmt 0)

On my Apache server, all /logs directories and /access_log files are 'above' web spaces. Thus there's no way anyone, or anything, should be able to access them from the web, neither is there any reason to include them in robots.txt. That MSN is even looking for /logs or /access_logs/ is outrageous.

Dijkgraaf




msg:4222517
 12:29 am on Oct 27, 2010 (gmt 0)

No it is not outrageous.
It could be someone who is interested in finding logs open to bots that has created a page of links pointing to the domains that they are interested in.
If the bot finds those links to that URL and you haven't disallowed it in robots.txt the bot will try and fetch it.
So simply, add /logs and /access_log to robots.txt and you won't get any more hits on those.

In fact I know one botnet which has actually compromised the logs folder on several sites, as it is trying server up compromised files via query string injection methods.
So I wouldn't be surprised if they were them.

Samizdata




msg:4222529
 1:08 am on Oct 27, 2010 (gmt 0)

That MSN is even looking for /logs or /access_logs/ is outrageous

Four independent reports here in 24 hours suggests a deliberate phishing exercise.

Have you disallowed those in your robots.txt?

I am not in the habit of disallowing non-existant directories.

The list would be infinite.

...

Dijkgraaf




msg:4222537
 1:27 am on Oct 27, 2010 (gmt 0)

You don't need to disallow all non-existent URL's, only the ones you are getting hits on and don't want to get hits on anymore.

This 152 message thread spans 6 pages: < < 152 ( 1 [2] 3 4 5 6 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved