Welcome to WebmasterWorld Guest from 50.17.117.221

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

MSN's many cloaked bots. Again.

     
11:44 pm on Aug 5, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Previously... [webmasterworld.com]

Currently, straight out of my logs...

65.52.33.73 - - [05/Aug/2010:15:45:09 -0700] "GET /dir/filename.html HTTP/1.1" 403 1468 "-" "-"

No UA, no robots.txt, no REF, no nothing. Not once. Not twice. Not even three times. Try eleven.

65.52.33.73
-
08/05 15:45:09/dir/filename.html
08/05 15:45:20/dir/filename.html
08/05 15:45:31/dir/filename.html
08/05 15:45:42/dir/filename.html
08/05 15:45:53/dir/filename.html
08/05 15:46:03/dir/filename.html
08/05 15:46:14/dir/filename.html
08/05 15:46:25/dir/filename.html
08/05 15:46:35/dir/filename.html
08/05 15:46:46/dir/filename.html
08/05 15:46:57/dir/filename.html

Same poor file. All hits 403'd because no UA; also because bare MSN IP and not a bona fide MSN bot.
2:48 pm on Aug 6, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:Feb 20, 2008
posts:94
votes: 0


I've got msn from slightly different 65.52 range
has msn UA

and all it does is occasionally hit robots.txt
rarely takes anything else even tho not banned
3:23 pm on Aug 6, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 7, 2004
posts:660
votes: 0


(reported by myself [webmasterworld.com] on Aug 2 in Bing):
What appears to be a human scraper on an MSN-Bot IP; took more than 14 pages in 7 seconds, and thus tripped the site fast scraper block [webmasterworld.com]. This is from the subsequent log:
    IP: 65.52.108.165
    Host lookup: msnbot-65-52-108-165.search.msn.com
    Timing: 2010-08-02 02:09:33 +0100 (2 pages)
    UA: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Crazy Browser 1.0.5)
2:50 pm on Aug 11, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


65.52 appears to be a low-key but consistent cloaked source:

65.52.6.206
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
robots.txt? NO

URI: Single dynamic file posted a mere 12 hours prior. No ref. File suffix and dir verboten to all bots generally, and to majors' bots and IPs specifically. Did not follow real-people link in 403.

Mid-March, 2010, from my notes.

65.52.26.149
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
robots.txt? NO

URI: Single html file that would've been okay if UA was msnbot. Link was in a tweet; hit came in a post-Twitter swarm. Did not follow real-people link in 403.
3:01 pm on Aug 11, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Just noted the following 65.52 hit x2 in a post-Twitter swarm yesterday:

65.52.2.10
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
robots.txt? NO

URIs: Again both of these would've been okay for msnbot, but not this UA:

08/10 17:01:23 /
08/10 17:24:57 /dir/filename.html
5:38 pm on Aug 11, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5815
votes: 64


For several years 65.52 has scraped image files using no UA. I've blocked it without any negative affect.

However, I started blocking the Yahhoo equivalent (image scraper, no UA) and my image listings dropped from Yahoo's image search.

I just got tired of micro-managing these endless bots so I take the loss.
6:49 pm on Aug 11, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


65.52.2.10 and 65.52.26.149 would have been refused here as a bot as it doesn't seem to have a proper rDNS. On the other hand, depending on headers, the UAs may or may not have been accepted on those IPs. I'll keep an eye open for them.
3:56 am on Aug 12, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


From another site -- no UA, no ref, no GET, no nothing. FWIW:

65.52.192.56 - - [07/Aug/2010:12:04:10 -0700] "HEAD / HTTP/1.1" 403 0 "-" "-"
65.52.192.70 - - [08/Aug/2010:16:35:53 -0700] "HEAD / HTTP/1.1" 403 0 "-" "-"
9:36 pm on Aug 12, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


I suppose it might be one of their public DSL/Proxy blocks?
11:48 pm on Aug 12, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 7, 2004
posts:660
votes: 0


dstiles:
I suppose it might be one of their public DSL/Proxy blocks?

The one I caught wasn't. 65.52.108.165 is one of their bot-IPs (also check the rDNS).
7:40 pm on Aug 14, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


Yes, most of65.52.108/24 is.

Your report is an odd one, not for a browser UA coming from a bot IP but because of the Crazy Browser UA, which I associate with very aggressive browsing. I agree with you: why would google do that? Are they testing this type of browser? Or just being stupid? I don't think it's a human, although I could be wrong.
11:26 pm on Aug 14, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 7, 2004
posts:660
votes: 0


dstiles:
why would google do that?

No no no, this is Microsoft!

Your report is an odd one, not for a browser UA coming from a bot IP ...

That's the worst feature of my report IMO. If a Webmaster cannot rely on a `stable' bot-IP to only be utilised by bots, then the trust-factor falls through the floor. The UA employed ramps up that concern, as it means that any non-tech employee of Microsoft (I'm making an optimistic assumption here) is allowed to use that IP.
9:33 pm on Aug 15, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


Of course it's MS! Oops! :)

As I said, I don't think it's a human. If it is human then, as you say, the trust factor is on the skids.
12:09 am on Aug 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 7, 2004
posts:660
votes: 0


dstiles:
I don't think it's a human

Then the trust factor is still on the skids - what on earth are they doing using `Crazy Browser'? Plus, why are they pulling more than 14 pages in 2 secs?
9:29 pm on Aug 16, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


That's probably what Crazy Browser does, although I haven't confirmed it recently. Lots of browsers have plug-ins etc to hike up scraping speeds and "high speed" bandwidth is helping this along.

Might be worth asking MS what they are doing. :)
3:46 am on Aug 17, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 7, 2004
posts:660
votes: 0


dstiles:
"high speed" bandwidth is helping this along

The record on my site so far is DHL at >300 hits / sec (no kidding).
11:21 pm on Aug 17, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


Don't think I've has anything quite that fast. Nearest was *&^%$ securitymetrics who bombed one of my sites for 45 minutes. They are totally IP banned now!

Re: google mistake above - just found a dozen or so hits from one of their bot IPs asking for favicon. With no UA at all AND looking in the sites' roots, where it ain't! :)
2:28 am on Aug 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


Some more IP's of MSN cloaked bots.
207.46.12.NNN
207.46.199.NNN
207.46.204.NNN
94.245.108.194
8:59 pm on Aug 18, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 18, 2010
posts:49
votes: 0


Agent: Mediapartners-Google

207.46.204.102
207.46.204.96
207.46.204.51
207.46.204.103
9:28 pm on Aug 18, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


Basically, if it ain't a known bot UA it's dumped, even if it has a bot IP.

Apropos which, I read somewhere that msnbot is being changed to bingbot soon, wrapped up in a basic mozilla UA. So, no more quick and dirty ^msnbot tests. :(
10:21 pm on Aug 18, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1666
votes: 35


207.46.12. range from MSN pulls CSS and JS Files with none Bot UA, skips the images.
5:28 am on Aug 24, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


msnbot-65-54-247-157.search.msn.com
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

18:58:11 /BingSiteAuth.xml
18:58:11 /LiveSearchSiteAuth.xml

robots.txt? NO

A long time ago, the latter file was looked-for by an MSN Webmaster Tools'esque UA for confirmation purposes:

msnbot-webmaster/1.0 (+http://search.msn.com/msnbot.htm)

Apparently, "BingSiteAuth.xml" is the New Thing:

"Cool Tips And Hot Tricks For The New Bing Webmaster Tools, Part 1" [bing.com...]

(Aside: That's news to me. Then again, I rarely check MSN/Bing's tools because they never, EVER honored multiple form-detailed requests to remove files denied in robots.txt but accessible in their results.)

Anyway. The new Bingahoo thing doesn't excuse .search.msn.com looking for Auth.xml files using:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)
2:45 am on Sept 1, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


A mere 145 characters+spaces in this cloaked UA seen this evening. Makes "msnbot/2.0b" look downright pithy.

msnbot-207-46-204-219.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)

robots.txt? NO
2:21 pm on Sept 1, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 7, 2004
posts:660
votes: 0


I cannot believe the lack of response to this thread.

Microsoft bot IP is compromised and being used by non-bot UAs, possibly even has been hacked, and NO response. Amazing.
7:28 pm on Sept 1, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5815
votes: 64



Microsoft bot IP is compromised and being used by non-bot UAs, possibly even has been hacked

Highly unlikely...

These types of hits have been coming from MS ranges for years.
9:38 pm on Sept 1, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


I agree with keyplr. Highly unlikely that an MS bot IP is compromised, and from observation over several years MS DO drive non-bot UAs from their bot lines. As do google, yahoo, yandex... on theirs.
5:27 pm on Sept 2, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 7, 2004
posts:660
votes: 0


dstiles:
from observation ... non-bot UAs from their bot lines. As do google

Cannot disagree more. Of all the bot IPs, G is as clean as a whistle, which is more than can be said of any others (and I am the opposite of a G-Fanboy). Sure, G employees try to hack from G netblocks but, I say again, the *bot* IP is as clean as a whistle and, in my experience, always has been.

keyplyr:
These types of hits have been coming ... for years

That makes it OK?
7:59 pm on Sept 2, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5815
votes: 64


keyplyr - These types of hits have been coming from MS ranges for years.
AlexK - That makes it OK?

Call them up and tell 'em you don't like it.
11:16 pm on Sept 2, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 7, 2004
posts:660
votes: 0


keyplyr:
Call them up and tell 'em you don't like it.

Your attitude gives the signal that it's OK. Mine is that it's not. We will need to agree to disagree.
6:09 pm on Sept 3, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5815
votes: 64


@ AlexK

You're missing the point here. The MS range being discussed has for years been engaged with questionable activity, hence we do not think it has recently been "compromised and being used by non-bot UAs, possibly even has been hacked" as you say.

This does not imply that "it's OK." My point is; what are you going to do about it? Ban that entire MS range? Good luck. Please come back after a couple months and post the resulting impact. I know I would be interested.

As for my "attitude" Well, that's another discussion and one my GF would gladly participate in.
This 152 message thread spans 6 pages: 152