Welcome to WebmasterWorld Guest from 54.158.24.235

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

MSN's many cloaked bots. Again.

     
11:44 pm on Aug 5, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Previously... [webmasterworld.com]

Currently, straight out of my logs...

65.52.33.73 - - [05/Aug/2010:15:45:09 -0700] "GET /dir/filename.html HTTP/1.1" 403 1468 "-" "-"

No UA, no robots.txt, no REF, no nothing. Not once. Not twice. Not even three times. Try eleven.

65.52.33.73
-
08/05 15:45:09/dir/filename.html
08/05 15:45:20/dir/filename.html
08/05 15:45:31/dir/filename.html
08/05 15:45:42/dir/filename.html
08/05 15:45:53/dir/filename.html
08/05 15:46:03/dir/filename.html
08/05 15:46:14/dir/filename.html
08/05 15:46:25/dir/filename.html
08/05 15:46:35/dir/filename.html
08/05 15:46:46/dir/filename.html
08/05 15:46:57/dir/filename.html

Same poor file. All hits 403'd because no UA; also because bare MSN IP and not a bona fide MSN bot.
2:48 pm on Aug 6, 2010 (gmt 0)

5+ Year Member



I've got msn from slightly different 65.52 range
has msn UA

and all it does is occasionally hit robots.txt
rarely takes anything else even tho not banned
3:23 pm on Aug 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(reported by myself [webmasterworld.com] on Aug 2 in Bing):
What appears to be a human scraper on an MSN-Bot IP; took more than 14 pages in 7 seconds, and thus tripped the site fast scraper block [webmasterworld.com]. This is from the subsequent log:
    IP: 65.52.108.165
    Host lookup: msnbot-65-52-108-165.search.msn.com
    Timing: 2010-08-02 02:09:33 +0100 (2 pages)
    UA: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Crazy Browser 1.0.5)
2:50 pm on Aug 11, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



65.52 appears to be a low-key but consistent cloaked source:

65.52.6.206
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
robots.txt? NO

URI: Single dynamic file posted a mere 12 hours prior. No ref. File suffix and dir verboten to all bots generally, and to majors' bots and IPs specifically. Did not follow real-people link in 403.

Mid-March, 2010, from my notes.

65.52.26.149
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
robots.txt? NO

URI: Single html file that would've been okay if UA was msnbot. Link was in a tweet; hit came in a post-Twitter swarm. Did not follow real-people link in 403.
3:01 pm on Aug 11, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Just noted the following 65.52 hit x2 in a post-Twitter swarm yesterday:

65.52.2.10
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)
robots.txt? NO

URIs: Again both of these would've been okay for msnbot, but not this UA:

08/10 17:01:23 /
08/10 17:24:57 /dir/filename.html
5:38 pm on Aug 11, 2010 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



For several years 65.52 has scraped image files using no UA. I've blocked it without any negative affect.

However, I started blocking the Yahhoo equivalent (image scraper, no UA) and my image listings dropped from Yahoo's image search.

I just got tired of micro-managing these endless bots so I take the loss.
6:49 pm on Aug 11, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



65.52.2.10 and 65.52.26.149 would have been refused here as a bot as it doesn't seem to have a proper rDNS. On the other hand, depending on headers, the UAs may or may not have been accepted on those IPs. I'll keep an eye open for them.
3:56 am on Aug 12, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



From another site -- no UA, no ref, no GET, no nothing. FWIW:

65.52.192.56 - - [07/Aug/2010:12:04:10 -0700] "HEAD / HTTP/1.1" 403 0 "-" "-"
65.52.192.70 - - [08/Aug/2010:16:35:53 -0700] "HEAD / HTTP/1.1" 403 0 "-" "-"
9:36 pm on Aug 12, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I suppose it might be one of their public DSL/Proxy blocks?
11:48 pm on Aug 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dstiles:
I suppose it might be one of their public DSL/Proxy blocks?

The one I caught wasn't. 65.52.108.165 is one of their bot-IPs (also check the rDNS).
7:40 pm on Aug 14, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Yes, most of65.52.108/24 is.

Your report is an odd one, not for a browser UA coming from a bot IP but because of the Crazy Browser UA, which I associate with very aggressive browsing. I agree with you: why would google do that? Are they testing this type of browser? Or just being stupid? I don't think it's a human, although I could be wrong.
11:26 pm on Aug 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dstiles:
why would google do that?

No no no, this is Microsoft!

Your report is an odd one, not for a browser UA coming from a bot IP ...

That's the worst feature of my report IMO. If a Webmaster cannot rely on a `stable' bot-IP to only be utilised by bots, then the trust-factor falls through the floor. The UA employed ramps up that concern, as it means that any non-tech employee of Microsoft (I'm making an optimistic assumption here) is allowed to use that IP.
9:33 pm on Aug 15, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Of course it's MS! Oops! :)

As I said, I don't think it's a human. If it is human then, as you say, the trust factor is on the skids.
12:09 am on Aug 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dstiles:
I don't think it's a human

Then the trust factor is still on the skids - what on earth are they doing using `Crazy Browser'? Plus, why are they pulling more than 14 pages in 2 secs?
9:29 pm on Aug 16, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



That's probably what Crazy Browser does, although I haven't confirmed it recently. Lots of browsers have plug-ins etc to hike up scraping speeds and "high speed" bandwidth is helping this along.

Might be worth asking MS what they are doing. :)
3:46 am on Aug 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dstiles:
"high speed" bandwidth is helping this along

The record on my site so far is DHL at >300 hits / sec (no kidding).
11:21 pm on Aug 17, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Don't think I've has anything quite that fast. Nearest was *&^%$ securitymetrics who bombed one of my sites for 45 minutes. They are totally IP banned now!

Re: google mistake above - just found a dozen or so hits from one of their bot IPs asking for favicon. With no UA at all AND looking in the sites' roots, where it ain't! :)
2:28 am on Aug 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some more IP's of MSN cloaked bots.
207.46.12.NNN
207.46.199.NNN
207.46.204.NNN
94.245.108.194
8:59 pm on Aug 18, 2010 (gmt 0)



Agent: Mediapartners-Google

207.46.204.102
207.46.204.96
207.46.204.51
207.46.204.103
9:28 pm on Aug 18, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Basically, if it ain't a known bot UA it's dumped, even if it has a bot IP.

Apropos which, I read somewhere that msnbot is being changed to bingbot soon, wrapped up in a basic mozilla UA. So, no more quick and dirty ^msnbot tests. :(
10:21 pm on Aug 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



207.46.12. range from MSN pulls CSS and JS Files with none Bot UA, skips the images.
5:28 am on Aug 24, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



msnbot-65-54-247-157.search.msn.com
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

18:58:11 /BingSiteAuth.xml
18:58:11 /LiveSearchSiteAuth.xml

robots.txt? NO

A long time ago, the latter file was looked-for by an MSN Webmaster Tools'esque UA for confirmation purposes:

msnbot-webmaster/1.0 (+http://search.msn.com/msnbot.htm)

Apparently, "BingSiteAuth.xml" is the New Thing:

"Cool Tips And Hot Tricks For The New Bing Webmaster Tools, Part 1" [bing.com...]

(Aside: That's news to me. Then again, I rarely check MSN/Bing's tools because they never, EVER honored multiple form-detailed requests to remove files denied in robots.txt but accessible in their results.)

Anyway. The new Bingahoo thing doesn't excuse .search.msn.com looking for Auth.xml files using:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)
2:45 am on Sep 1, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



A mere 145 characters+spaces in this cloaked UA seen this evening. Makes "msnbot/2.0b" look downright pithy.

msnbot-207-46-204-219.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)

robots.txt? NO
2:21 pm on Sep 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I cannot believe the lack of response to this thread.

Microsoft bot IP is compromised and being used by non-bot UAs, possibly even has been hacked, and NO response. Amazing.
7:28 pm on Sep 1, 2010 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




Microsoft bot IP is compromised and being used by non-bot UAs, possibly even has been hacked

Highly unlikely...

These types of hits have been coming from MS ranges for years.
9:38 pm on Sep 1, 2010 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I agree with keyplr. Highly unlikely that an MS bot IP is compromised, and from observation over several years MS DO drive non-bot UAs from their bot lines. As do google, yahoo, yandex... on theirs.
5:27 pm on Sep 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dstiles:
from observation ... non-bot UAs from their bot lines. As do google

Cannot disagree more. Of all the bot IPs, G is as clean as a whistle, which is more than can be said of any others (and I am the opposite of a G-Fanboy). Sure, G employees try to hack from G netblocks but, I say again, the *bot* IP is as clean as a whistle and, in my experience, always has been.

keyplyr:
These types of hits have been coming ... for years

That makes it OK?
7:59 pm on Sep 2, 2010 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



keyplyr - These types of hits have been coming from MS ranges for years.
AlexK - That makes it OK?

Call them up and tell 'em you don't like it.
11:16 pm on Sep 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



keyplyr:
Call them up and tell 'em you don't like it.

Your attitude gives the signal that it's OK. Mine is that it's not. We will need to agree to disagree.
6:09 pm on Sep 3, 2010 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



@ AlexK

You're missing the point here. The MS range being discussed has for years been engaged with questionable activity, hence we do not think it has recently been "compromised and being used by non-bot UAs, possibly even has been hacked" as you say.

This does not imply that "it's OK." My point is; what are you going to do about it? Ban that entire MS range? Good luck. Please come back after a couple months and post the resulting impact. I know I would be interested.

As for my "attitude" Well, that's another discussion and one my GF would gladly participate in.
This 152 message thread spans 6 pages: 152
 

Featured Threads

Hot Threads This Week

Hot Threads This Month