homepage Welcome to WebmasterWorld Guest from 54.167.174.90
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 152 message thread spans 6 pages: < < 152 ( 1 [2] 3 4 5 6 > >     
MSN's many cloaked bots. Again.
Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 11:44 pm on Aug 5, 2010 (gmt 0)

Previously... [webmasterworld.com]

Currently, straight out of my logs...

65.52.33.73 - - [05/Aug/2010:15:45:09 -0700] "GET /dir/filename.html HTTP/1.1" 403 1468 "-" "-"

No UA, no robots.txt, no REF, no nothing. Not once. Not twice. Not even three times. Try eleven.

65.52.33.73
-
08/05 15:45:09/dir/filename.html
08/05 15:45:20/dir/filename.html
08/05 15:45:31/dir/filename.html
08/05 15:45:42/dir/filename.html
08/05 15:45:53/dir/filename.html
08/05 15:46:03/dir/filename.html
08/05 15:46:14/dir/filename.html
08/05 15:46:25/dir/filename.html
08/05 15:46:35/dir/filename.html
08/05 15:46:46/dir/filename.html
08/05 15:46:57/dir/filename.html

Same poor file. All hits 403'd because no UA; also because bare MSN IP and not a bona fide MSN bot.

 

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 7:32 pm on Sep 3, 2010 (gmt 0)

I hate cloaking. Alas, most majors and former majors cloak like crazy. Here's a bunch of bot-spotter minutiae, in no particular order...

-----
GOOGLE
-----
The empire-builder uses bare IPs and no UAs to hit (& hit & hit) favicons for sites added to its Webmaster Tools pages (Home list, site Dashboards, etc.). For example --

72.14.192.68 - - [03/Sep/2010:10:11:39 -0700] "GET /favicon.ico HTTP/1.1" 200 5430 "-" "-"

Additional IPs used the exact same way:

72.14.192.68
72.14.212.81
72.14.212.82
72.14.212.85
72.14.212.87
74.125.154.81
74.125.154.85

All bare IPs, no UAs, no robots.txt, no REF, no nothing. (sighs) And don't get me started on G's Code and Labs creations, a la:

74.125.154.85
AppEngine-Google; (+http://code.google.com/appengine; appid: linksalpha)
robots.txt? NO

2010 [webmasterworld.com...]

-----
IBM
-----
Like the Energizer Bunny, .watson.ibm.com just keeps going, and going, and going... For what purpose? Beats me.

2010 [webmasterworld.com...] [webmasterworld.com...]
2009 [webmasterworld.com...]

-----
MSN
-----
This thread and its predecessor [webmasterworld.com...] aren't the only reports of MSN's cloaking:

2009 [webmasterworld.com...]

And here's a little oddity from last January: Microsoft's portal domain came a'crawlin':

gig4-2.tuk2f-gsr-a.us.msn.net
Microsoft MSN SocialStreams Bot
robots.txt? Yes

Hmm. I guess "Microsoft MSN SocialStreams Bot" is "Microsoft Bing Mobile SocialStreams Bot" now?

2010 [webmasterworld.com...]

-----
DOW JONES
-----
Multiple IPs/server farms... Again for what purpose? Dunno.

2009-2010 [webmasterworld.com...]

-----
YAHOO
-----
Too many years, too many probs. Just last month:

research-mm10.corp.sp1.yahoo.com
Firefox 4.0
robots.txt? NO

Just today, no UA:

ycar3.mobile.sp1.yahoo.com - - [02/Sep/2010:07:47:55 -0700] "GET / HTTP/1.1" 403 702 "-" "-"

Oh, and HEAD requests, too. Newly atypical for Slurp on my sites. And redundant: This file didn't change in 30 secs:

llf531077.crawl.yahoo.net - - [21/Jul/2010:17:08:38 -0700] "HEAD /dirA/filenameA.html [snip]"
llf531077.crawl.yahoo.net - - [21/Jul/2010:17:09:08 -0700] "HEAD /dirA/filenameA.html [snip]"

'dev' subdomains from .corp.yahoo.com historically problematic, too:

18ndev96.yst.corp.yahoo.com
sedev1039.yst.corp.yahoo.com

Re both of those:
Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
robots.txt? NO

But wait! There's more!

2009-2010 [webmasterworld.com...]

-----
BAIDU
-----
Here's a thread from 2009 and they're still at it. 'Nuff said: [webmasterworld.com...]

-----
Okay, that's enough. But that's not all... All of the above are but a miniscule fraction of hits from cloaked start-ups, wanna-bes, Twitter swarmers, student projects, semi-clueless individuals and denizens of the cesspool that is AmazonAWS [webmasterworld.com...]

Solution? I err on the restrictive side when it comes to anyone wasting my and/or my clients' bandwidth, even more so when it comes to crawling for unknown reasons. No read/heed robots.txt? 403

Thoughts?

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4182830 posted 7:31 am on Sep 4, 2010 (gmt 0)

I hate cloaking


When there is no UA or some obvious garbage UA, it's not cloaking, it's the white elephant in the room, and it would never hit a server that whitelists bot access in the first place.

Bot Blocking 101: No shirt, No Shoes, No UA, No Service.

When it's a bot using a browser UA coming from a known bot location, now THAT's cloaking!

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4182830 posted 4:56 pm on Sep 4, 2010 (gmt 0)

I'm seeing several hits during the past fortnight (approx 130, earliest 14th August from the range 65.55.25/24 with rDNS on the order of ns1.msft.net.(space here)msnhst.microsoft.com

All have the UA of:
msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

Needless to say, all blocked.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 6:57 pm on Sep 4, 2010 (gmt 0)

@incrediBILL: I guess I don't think allowing, say, googlebot from .googlebot.com or msnbot from .search.msn.com is a blanket invitation to all non-search spawn from the companies' bare (no rDNS) IPs.

@dstiles: A while back, that UA came from .search.msn.com. [webmasterworld.com...] It's still a stumper (or a tpyo:)

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4182830 posted 8:03 pm on Sep 4, 2010 (gmt 0)

I remember the tail-end but couldn't recall if it was msn or google. :)

Possibly a (b for) beta? Doubt it, though.

Interesting aside: both msnhst.microsoft.com and msft.net (and also microsoft.com itself) are listed in spam databases, albeit mostly for being "rfc-ignorant". :)

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4182830 posted 10:41 pm on Sep 4, 2010 (gmt 0)

Just seen the ._ bot on 65.55.3.141 - a genuine bot IP.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 8:33 am on Sep 12, 2010 (gmt 0)

And now, from Microsoft in the UK:

94.245.105.68
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune 4.0; MS-RTC LM 8; .NET4.0C; .NET4.0E; InfoPath.3)

robots.txt? NO

Apparently right out of the gate, that IP's plopped into 3 honey pots within a week using the same UA. [projecthoneypot.org...]

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4182830 posted 7:01 pm on Sep 12, 2010 (gmt 0)

Popped up here 10th Sept and hit 15 times in the next two days.

It's another of those msft.net/msnhst.microsoft.com rDNS entries. Beginning to think that anything with that rDNS should be blocked.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4182830 posted 10:37 pm on Sep 12, 2010 (gmt 0)

94.245.105.68 just hitting index page, no robots.txt.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 3:58 am on Sep 13, 2010 (gmt 0)

dstiles & keyplyr -- Did your hits use the same UA? If not, which one(s), please?

dstiles: We do rDNS on the server and 94.245.105.68 hit bare. Upthread, Dijkgraaf reported 94.245.108.194 as suspect, and a smattering of WHOIS checks indicates that similar Microsoft UK IPs are no-rDNS. What am I missing? Where's the "msft.net/msnhst.microsoft.com" connection, plz?

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4182830 posted 4:17 am on Sep 13, 2010 (gmt 0)

94.245.105.68
"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune 4.0; MS-RTC LM 8; .NET4.0C; .NET4.0E; InfoPath.3)"

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 5:12 pm on Sep 13, 2010 (gmt 0)

Yep. Exact same cloak. Thanks, keyplyr.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4182830 posted 11:25 pm on Sep 13, 2010 (gmt 0)

Pfui - same UA.

I'm getting the rDNS through my Linux/Ubuntu Network Tools lookup. Robtex and blacklistalert.org do not give any rDNS. I've just run a /24 scan of rDNS through my usual MS DNS server (which is in USA) and got absolutely nothing.

I'm guessing that the rDNS Network Tools is returning is through some higher source but have no idea how/why/what.

I began to trace the IP range through robtex but ran out of time.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 12:25 am on Sep 14, 2010 (gmt 0)

dstiles, thanks for the info. You know, when it comes to Things MSN, I tend to err on the 403 side if the IP's bare and the UA isn't search-specific. Alas, as of 30 minutes ago, it looks like I need to batten down still more hatches...

msnbot-207-46-12-27.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1)

robots.txt? NO
graphics? NO
.js? YES
.css? YES

What the --?

How silly of me to think I could 'trust' msnbot-[yada-yada].search.msn.com. No more.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4182830 posted 8:35 pm on Sep 14, 2010 (gmt 0)

Looking more closely, with brain turned on, I think what I'm seeing as rDNS is the SOA for the domains, not rDNS at all. Got the lookup results turned up too high. Sorry for the confusion. :(

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 2:48 am on Sep 17, 2010 (gmt 0)

No prob:) But speaking of problems, it's déja vu all over again... Just like the OP but a slightly different IP and file. Alas, still:

No UA, no robots.txt, no REF, no nothing. Not once. Not twice. Not even three times. Try eleven.

65.52.32.195
-
09/16 19:09:35
09/16 19:09:45
09/16 19:09:56
09/16 19:10:07
09/16 19:10:18
09/16 19:10:29
09/16 19:10:39
09/16 19:10:50
09/16 19:11:01
09/16 19:11:11
09/16 19:11:22

Eleven hits to yet another file? Still no robots.txt? MS is more suspect, and sloppy, than every before.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 3:35 am on Sep 27, 2010 (gmt 0)

UA reported up-thread (message #4191289), where files requested appeared authentication-related. This time, just root:

msnbot-65-54-247-157.search.msn.com
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

robots.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 9:02 am on Oct 3, 2010 (gmt 0)

Addendum re preceding: See also Now seeing Bingbot [webmasterworld.com...]
-----

Today: Mashup time with MSN Hosts, IPs and UAs. (Why? WHY?) All hits from .search.msn.com except for one bare IP. All hits by "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" except for one 'new' one in the herd...

msnbot-65-52-49-139.search.msn.com
msnbot/2.0b (+http://search.msn.com/msnbot.htm)
10/02 15:53:20 /robots.txt

msnbot-207-46-199-24.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
10/02 16:03:14 /

207.46.193.50
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
10/02 16:53:13 /filename.html

msnbot-207-46-12-202.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
10/02 17:37:53 /filename.html

msnbot-207-46-199-46.search.msn.com
10/02 18:05:40 /robots.txt

msnbot-207-46-12-70.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
10/02 18:06:18 /filename.html

msnbot-207-46-199-46.search.msn.com
10/02 18:10:59 /filename.html

msnbot-207-46-199-199.search.msn.com
10/02 19:14:28 /robots.txt

msnbot-207-46-204-231.search.msn.com
10/02 19:43:05 /robots.txt

Dijkgraaf

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 3:02 am on Oct 4, 2010 (gmt 0)

The stealth msnbot will also break at least one disallow rule in robots.txt.
I disallowed a JavaScript file. But it will fetch it anyway if it fetched a page that references it.

IP addresses
94.245.127.nnn
94.245.108.nnn
94.245.105.nnn
65.55.24.nnn
65.52.104.nnn
207.46.92.nnn
207.46.204.nnn
207.46.199.nnn
207.46.195.nnn
207.46.193.nnn
207.46.12.nnn

UA's seen
-
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Trident/4.0; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.30729)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.5.30729; .NET CLR 3.0.30618; InfoPath.2; HYVES)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; MS-OC 4.0.7341.0508; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MS-RTC LM 8; InfoPath.3)
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Zune 4.0; MS-RTC LM 8; .NET4.0C; .NET4.0E; InfoPath.3)
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 4:22 am on Oct 8, 2010 (gmt 0)

Not a new cloaked UA, but alas, still at it:

msnbot-207-46-12-212.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729)

10/07 20:44:40 /dir/filename.html

robots.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 6:30 pm on Oct 18, 2010 (gmt 0)

For more tales of simultaneous cloaking and crawling, see:

Now seeing Bingbot
[webmasterworld.com...]

caribguy

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 7:43 am on Oct 25, 2010 (gmt 0)

As discussed here [webmasterworld.com] and here [webmasterworld.com]

It just does not seem the garden variety bad bot:

Ip's: mostly 65.55.3.0/24 and sporadically 65.55.25.nnn or 207.68.164.nnn

msnbot-65-55-3-175.search.msn.com
no rDNS for 65.55.25.nnn
gig4-2.tuk2f-gsr-a.us.msn.net

Disregards robots.txt

Interesting attempt today:

URL 'http://www.example.com/access_logs'
AcceptLanguage {}
HTTP_ACCEPT '*/*'
CONNECTION_TYPE 'Keep-Alive'
HTTP_USER_AGENT 'msnbot/2.0b (+http://search.msn.com/msnbot.htm)._'
HTTP_FROM 'msnbotf(at)microsoft.com'
SERVER_PROTOCOL 'HTTP/1.1'

Are those two deliberate misspellings? And going for non existent 'access_logs' - hmmm...

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4182830 posted 9:25 pm on Oct 25, 2010 (gmt 0)

I saw a similar request from 65.55.25.141 for just "/logs" very early today.

Two weeks ago, I saw several previous requests for the (valid) home-page URL from 65.55.3.170

All blocked though, since the rDNS for 65.55.25.141 is not a "msnbot" hostname, and if they can't spell the msnbot UA correctly or get the other headers right, they're not allowed in.

Jim

caribguy

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 9:45 pm on Oct 25, 2010 (gmt 0)

Yep, saw a request for robots.txt and subsequent /logs on a handful of domains just minutes ago.

In the past 18 hours, it's been hitting from:

65.55.3.136
65.55.3.168
65.55.3.170
65.55.3.175
65.55.3.177
65.55.3.198
65.55.25.150
65.55.55.206
65.55.55.214

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4182830 posted 10:17 pm on Oct 26, 2010 (gmt 0)

Saw one today as well :

GET /access_logs/ - 80 - 65.55.25.138 HTTP/1.1 msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

The UA with the ._ at the end, they started that quite some time ago. Anyway they got a 404 for their effort ;)

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 11:29 pm on Oct 26, 2010 (gmt 0)

Consecutive requests for /logs/ and /access_logs/ this morning from msnbot.

Unreasonable and unwelcome behaviour.

...

Dijkgraaf

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 11:38 pm on Oct 26, 2010 (gmt 0)

Have you disallowed those in your robots.txt?

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 12:00 am on Oct 27, 2010 (gmt 0)

On my Apache server, all /logs directories and /access_log files are 'above' web spaces. Thus there's no way anyone, or anything, should be able to access them from the web, neither is there any reason to include them in robots.txt. That MSN is even looking for /logs or /access_logs/ is outrageous.

Dijkgraaf

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 12:29 am on Oct 27, 2010 (gmt 0)

No it is not outrageous.
It could be someone who is interested in finding logs open to bots that has created a page of links pointing to the domains that they are interested in.
If the bot finds those links to that URL and you haven't disallowed it in robots.txt the bot will try and fetch it.
So simply, add /logs and /access_log to robots.txt and you won't get any more hits on those.

In fact I know one botnet which has actually compromised the logs folder on several sites, as it is trying server up compromised files via query string injection methods.
So I wouldn't be surprised if they were them.

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 1:08 am on Oct 27, 2010 (gmt 0)

That MSN is even looking for /logs or /access_logs/ is outrageous

Four independent reports here in 24 hours suggests a deliberate phishing exercise.

Have you disallowed those in your robots.txt?

I am not in the habit of disallowing non-existant directories.

The list would be infinite.

...

Dijkgraaf

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4182830 posted 1:27 am on Oct 27, 2010 (gmt 0)

You don't need to disallow all non-existent URL's, only the ones you are getting hits on and don't want to get hits on anymore.

This 152 message thread spans 6 pages: < < 152 ( 1 [2] 3 4 5 6 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved