homepage Welcome to WebmasterWorld Guest from 54.167.10.244
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 152 message thread spans 6 pages: 152 ( [1] 2 3 4 5 6 > >     
MSN's many cloaked bots. Again.
Pfui




msg:4182832
 11:44 pm on Aug 5, 2010 (gmt 0)

Previously... [webmasterworld.com]

Currently, straight out of my logs...

65.52.33.73 - - [05/Aug/2010:15:45:09 -0700] "GET /dir/filename.html HTTP/1.1" 403 1468 "-" "-"

No UA, no robots.txt, no REF, no nothing. Not once. Not twice. Not even three times. Try eleven.

65.52.33.73
-
08/05 15:45:09/dir/filename.html
08/05 15:45:20/dir/filename.html
08/05 15:45:31/dir/filename.html
08/05 15:45:42/dir/filename.html
08/05 15:45:53/dir/filename.html
08/05 15:46:03/dir/filename.html
08/05 15:46:14/dir/filename.html
08/05 15:46:25/dir/filename.html
08/05 15:46:35/dir/filename.html
08/05 15:46:46/dir/filename.html
08/05 15:46:57/dir/filename.html

Same poor file. All hits 403'd because no UA; also because bare MSN IP and not a bona fide MSN bot.

 

Megaclinium




msg:4183134
 2:48 pm on Aug 6, 2010 (gmt 0)

I've got msn from slightly different 65.52 range
has msn UA

and all it does is occasionally hit robots.txt
rarely takes anything else even tho not banned

AlexK




msg:4183145
 3:23 pm on Aug 6, 2010 (gmt 0)

(reported by myself [webmasterworld.com] on Aug 2 in Bing):
What appears to be a human scraper on an MSN-Bot IP; took more than 14 pages in 7 seconds, and thus tripped the site fast scraper block [webmasterworld.com]. This is from the subsequent log:
    IP: 65.52.108.165
    Host lookup: msnbot-65-52-108-165.search.msn.com
    Timing: 2010-08-02 02:09:33 +0100 (2 pages)
    UA: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Crazy Browser 1.0.5)

Pfui




msg:4185391
 2:50 pm on Aug 11, 2010 (gmt 0)

65.52 appears to be a low-key but consistent cloaked source:

65.52.6.206
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
robots.txt? NO

URI: Single dynamic file posted a mere 12 hours prior. No ref. File suffix and dir verboten to all bots generally, and to majors' bots and IPs specifically. Did not follow real-people link in 403.

Mid-March, 2010, from my notes.

65.52.26.149
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
robots.txt? NO

URI: Single html file that would've been okay if UA was msnbot. Link was in a tweet; hit came in a post-Twitter swarm. Did not follow real-people link in 403.

Pfui




msg:4185394
 3:01 pm on Aug 11, 2010 (gmt 0)

Just noted the following 65.52 hit x2 in a post-Twitter swarm yesterday:

65.52.2.10
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
robots.txt? NO

URIs: Again both of these would've been okay for msnbot, but not this UA:

08/10 17:01:23 /
08/10 17:24:57 /dir/filename.html

keyplyr




msg:4185482
 5:38 pm on Aug 11, 2010 (gmt 0)

For several years 65.52 has scraped image files using no UA. I've blocked it without any negative affect.

However, I started blocking the Yahhoo equivalent (image scraper, no UA) and my image listings dropped from Yahoo's image search.

I just got tired of micro-managing these endless bots so I take the loss.

dstiles




msg:4185500
 6:49 pm on Aug 11, 2010 (gmt 0)

65.52.2.10 and 65.52.26.149 would have been refused here as a bot as it doesn't seem to have a proper rDNS. On the other hand, depending on headers, the UAs may or may not have been accepted on those IPs. I'll keep an eye open for them.

Pfui




msg:4185762
 3:56 am on Aug 12, 2010 (gmt 0)

From another site -- no UA, no ref, no GET, no nothing. FWIW:

65.52.192.56 - - [07/Aug/2010:12:04:10 -0700] "HEAD / HTTP/1.1" 403 0 "-" "-"
65.52.192.70 - - [08/Aug/2010:16:35:53 -0700] "HEAD / HTTP/1.1" 403 0 "-" "-"

dstiles




msg:4186176
 9:36 pm on Aug 12, 2010 (gmt 0)

I suppose it might be one of their public DSL/Proxy blocks?

AlexK




msg:4186218
 11:48 pm on Aug 12, 2010 (gmt 0)

dstiles:
I suppose it might be one of their public DSL/Proxy blocks?

The one I caught wasn't. 65.52.108.165 is one of their bot-IPs (also check the rDNS).

dstiles




msg:4187118
 7:40 pm on Aug 14, 2010 (gmt 0)

Yes, most of65.52.108/24 is.

Your report is an odd one, not for a browser UA coming from a bot IP but because of the Crazy Browser UA, which I associate with very aggressive browsing. I agree with you: why would google do that? Are they testing this type of browser? Or just being stupid? I don't think it's a human, although I could be wrong.

AlexK




msg:4187165
 11:26 pm on Aug 14, 2010 (gmt 0)

dstiles:
why would google do that?

No no no, this is Microsoft!

Your report is an odd one, not for a browser UA coming from a bot IP ...

That's the worst feature of my report IMO. If a Webmaster cannot rely on a `stable' bot-IP to only be utilised by bots, then the trust-factor falls through the floor. The UA employed ramps up that concern, as it means that any non-tech employee of Microsoft (I'm making an optimistic assumption here) is allowed to use that IP.

dstiles




msg:4187408
 9:33 pm on Aug 15, 2010 (gmt 0)

Of course it's MS! Oops! :)

As I said, I don't think it's a human. If it is human then, as you say, the trust factor is on the skids.

AlexK




msg:4187453
 12:09 am on Aug 16, 2010 (gmt 0)

dstiles:
I don't think it's a human

Then the trust factor is still on the skids - what on earth are they doing using `Crazy Browser'? Plus, why are they pulling more than 14 pages in 2 secs?

dstiles




msg:4187964
 9:29 pm on Aug 16, 2010 (gmt 0)

That's probably what Crazy Browser does, although I haven't confirmed it recently. Lots of browsers have plug-ins etc to hike up scraping speeds and "high speed" bandwidth is helping this along.

Might be worth asking MS what they are doing. :)

AlexK




msg:4188089
 3:46 am on Aug 17, 2010 (gmt 0)

dstiles:
"high speed" bandwidth is helping this along

The record on my site so far is DHL at >300 hits / sec (no kidding).

dstiles




msg:4188424
 11:21 pm on Aug 17, 2010 (gmt 0)

Don't think I've has anything quite that fast. Nearest was *&^%$ securitymetrics who bombed one of my sites for 45 minutes. They are totally IP banned now!

Re: google mistake above - just found a dozen or so hits from one of their bot IPs asking for favicon. With no UA at all AND looking in the sites' roots, where it ain't! :)

Dijkgraaf




msg:4188467
 2:28 am on Aug 18, 2010 (gmt 0)

Some more IP's of MSN cloaked bots.
207.46.12.NNN
207.46.199.NNN
207.46.204.NNN
94.245.108.194

MxAngel




msg:4188841
 8:59 pm on Aug 18, 2010 (gmt 0)

Agent: Mediapartners-Google

207.46.204.102
207.46.204.96
207.46.204.51
207.46.204.103

dstiles




msg:4188862
 9:28 pm on Aug 18, 2010 (gmt 0)

Basically, if it ain't a known bot UA it's dumped, even if it has a bot IP.

Apropos which, I read somewhere that msnbot is being changed to bingbot soon, wrapped up in a basic mozilla UA. So, no more quick and dirty ^msnbot tests. :(

blend27




msg:4188875
 10:21 pm on Aug 18, 2010 (gmt 0)

207.46.12. range from MSN pulls CSS and JS Files with none Bot UA, skips the images.

Pfui




msg:4191289
 5:28 am on Aug 24, 2010 (gmt 0)

msnbot-65-54-247-157.search.msn.com
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

18:58:11 /BingSiteAuth.xml
18:58:11 /LiveSearchSiteAuth.xml

robots.txt? NO

A long time ago, the latter file was looked-for by an MSN Webmaster Tools'esque UA for confirmation purposes:

msnbot-webmaster/1.0 (+http://search.msn.com/msnbot.htm)

Apparently, "BingSiteAuth.xml" is the New Thing:

"Cool Tips And Hot Tricks For The New Bing Webmaster Tools, Part 1" [bing.com...]

(Aside: That's news to me. Then again, I rarely check MSN/Bing's tools because they never, EVER honored multiple form-detailed requests to remove files denied in robots.txt but accessible in their results.)

Anyway. The new Bingahoo thing doesn't excuse .search.msn.com looking for Auth.xml files using:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

Pfui




msg:4194845
 2:45 am on Sep 1, 2010 (gmt 0)

A mere 145 characters+spaces in this cloaked UA seen this evening. Makes "msnbot/2.0b" look downright pithy.

msnbot-207-46-204-219.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)

robots.txt? NO

AlexK




msg:4195020
 2:21 pm on Sep 1, 2010 (gmt 0)

I cannot believe the lack of response to this thread.

Microsoft bot IP is compromised and being used by non-bot UAs, possibly even has been hacked, and NO response. Amazing.

keyplyr




msg:4195215
 7:28 pm on Sep 1, 2010 (gmt 0)


Microsoft bot IP is compromised and being used by non-bot UAs, possibly even has been hacked

Highly unlikely...

These types of hits have been coming from MS ranges for years.

dstiles




msg:4195291
 9:38 pm on Sep 1, 2010 (gmt 0)

I agree with keyplr. Highly unlikely that an MS bot IP is compromised, and from observation over several years MS DO drive non-bot UAs from their bot lines. As do google, yahoo, yandex... on theirs.

AlexK




msg:4195673
 5:27 pm on Sep 2, 2010 (gmt 0)

dstiles:
from observation ... non-bot UAs from their bot lines. As do google

Cannot disagree more. Of all the bot IPs, G is as clean as a whistle, which is more than can be said of any others (and I am the opposite of a G-Fanboy). Sure, G employees try to hack from G netblocks but, I say again, the *bot* IP is as clean as a whistle and, in my experience, always has been.

keyplyr:
These types of hits have been coming ... for years

That makes it OK?

keyplyr




msg:4195757
 7:59 pm on Sep 2, 2010 (gmt 0)

keyplyr - These types of hits have been coming from MS ranges for years.
AlexK - That makes it OK?

Call them up and tell 'em you don't like it.

AlexK




msg:4195830
 11:16 pm on Sep 2, 2010 (gmt 0)

keyplyr:
Call them up and tell 'em you don't like it.

Your attitude gives the signal that it's OK. Mine is that it's not. We will need to agree to disagree.

keyplyr




msg:4196196
 6:09 pm on Sep 3, 2010 (gmt 0)

@ AlexK

You're missing the point here. The MS range being discussed has for years been engaged with questionable activity, hence we do not think it has recently been "compromised and being used by non-bot UAs, possibly even has been hacked" as you say.

This does not imply that "it's OK." My point is; what are you going to do about it? Ban that entire MS range? Good luck. Please come back after a couple months and post the resulting impact. I know I would be interested.

As for my "attitude" Well, that's another discussion and one my GF would gladly participate in.

This 152 message thread spans 6 pages: 152 ( [1] 2 3 4 5 6 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved