Forum Moderators: open

Message Too Old, No Replies

Was this a fake "msnbot"? (Non-MS IP; no robots.txt; triggered traps)

Does Microsoft sell/license their bot to others?

         

Pfui

4:45 pm on Feb 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We were hit hard earlier today (Pacific) by an apparent msnbot run from a single, non-MS IP address out of a HUGE allocation (XX.XXX.0.0 - XX.XXX.255.255). From my logs:

XX.XXX.69.189 - [R_HST] - [H_REF] - msnbot/1.0 (+http://search.msn.com/msnbot.htm)

That IP/UA ignored all robots.txt instructions applicable to all bots, as well as msnbot-specific instructions and "MSN Search Web Crawler and Site Indexing" Site Owner specs [search.msn.com].

Because my robots.txt is usually meticulously adhered to when IPs/hosts are Microsoft-related, I'm concerned that the IP/host either spoofed the UA, and/or intentionally overrode its rules.

I blocked the closest IP via mod_rewrite (XX.XXX.69.) and sent a Cease-and-Desist to abuse@, etc., but now I'm very wary of allowing the UA "msnbot" at all unless I also restrict it to msn.com or its IPs. (An overreaction, perhaps, but they got everything that wasn't nailed down or already UA-blocked by mod_rewrite.)

Thoughts? Ever see a spoofed msnbot?

volatilegx

3:26 pm on Feb 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I see spoofed msnbots occasionally. Yours is probably spoofed, too, but I can't know for sure because you obscured the IP address. You can show it if you like.

Pfui

6:49 pm on Feb 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks. I wasn't sure when we should include that kind of specific info and when we shouldn't. The bot's originating, apparently non-zombie IP was:

63.223.69.189

That's part of Beyond The Network America, Inc. (63.216.0.0 - 63.223.255.255; btnaccess.com), wholly-owned subsidiary of PCCW Limited / PCCW Global (pccw.com). From what I understand, blocking any of those IPs is a bit like taking out a chunk of the world at the knees.

I don't expect a rapid response, if any. (My Cc'd C&D resulted in autoresponders from supportamerica at pccwbtn.com and abuseresponse at btnaccess.com.)

FWIW, the size of the non-Microsoft mother ship prompted me to wonder if msnbot's name had been spoofed or if what hit me was a legit, if rudely re-engineered, msnbot running under a corporate license.

Not that it matters, really, that I'm newly wary of the name, because an abusive bot by any name is a block-worthy pain.

wilderness

7:21 pm on Feb 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From what I understand, blocking any of those IPs is a bit like taking out a chunk of the world at the knees.

I'm not so sure that I agree with your analysis.

go to ARIN and type in the following " > 63.216." (minus the quotes. Be sure to leave the blank space between the > and 63.

You'll see that some of their ranges are provided to colocators.

bull (Jan) also provided the following on November 7 of last year:

205.177.72.206 - - [07/Nov/2005:02:16:23 +0100] "GET /foobar.htm HTTP/1.1"
403 - "-" "discovery/0.5libwww-perl/5.803"

Best Don

Pfui

8:47 pm on Feb 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don, thanks for the details, although, alas, I don't really understand what you're explaining to me.

I did the ARIN lookup and didn't see any 63.223. IP ranges. Also, I don't know what "bull (Jan)" is or refers to, sorry. (As you might imagine, permutations of "bull" via Google and wikipedia generate all kinds of sites and meanings:)

At the risk of veering waaay off-topic...

I was just told that blocking, say, 63.223., would add up to a heckuva lot of addresses because it's pretty much a math thing --

63.223.69.X: X = (0-255) => 256 addresses
63.223.XX.X: XX = (0-255) x X = (0-255) => 256 x 256 => 65,536 addresses (max.)

(Right?)

-- and in a global company, 65,536 addresses could add up to countries.

For example, an upstream SysAdmin recently mis-typed an Asian block and accidentally firewalled chunks of Asia, Australia and New Zealand.

So getting back to bot IDs...

Presuming the bot that hit us was fake-named, I've got some .htaccess files to update. Shoot.

wilderness

10:43 pm on Feb 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I did the ARIN lookup and didn't see any 63.223. IP ranges.

What I previously advised you to check, returns a "subnet delegation" from ARIN for the ranges of that provider which begin at the range 63.216 then proceed until the subnet delagations from ARIN reach the end of the providers range.

Perhaps the remainder of their customers use dynamic IP's instead of fixed IP ranges?
In any event, 63.216.#*$!.xxxx to 63.223.xxx.xxx is a fairly large range not to have any customers shown in a network that obviously sells.
Especially today when major internet providers in North America are placing great urgency in dividing once-former large IP ranges into smaller localized ranges.

Also, I don't know what "bull (Jan)" is or refers to, sorry.

Jan is a participant in this forum. His screename is "bull".

Nobody in this forum is capable of advising you what is best for your website (s).
ONLY you have the capability of deciding what is beneficial and what is detrimental.

All I was saying is that what you percieved as a "large range"
63.216.xxx.xxxx to 63.223.xxx.xxx
Is really not that large.
It all depends on the market capability you may possibly derive from the range.

thetrasher

5:53 pm on Feb 11, 2006 (gmt 0)

10+ Year Member



That's part of Beyond The Network America, Inc. (63.216.0.0 - 63.223.255.255; btnaccess.com)

Maybe you should visit their website and check out their products & services! They're offering hosting, colocation and dedicated internet access for enterprises (DSL,T1,T3). I wouldn't expect anything but a bot.

deny from 63.216.0.0/13

But wait ... maybe ... is it a cloaking hunt?

Pfui

8:29 pm on Feb 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Prior to posting I visited a number of the bot host's related sites hoping to answer my own questions. (As I stated in mssg #3, I sent C&Ds to a number of different addresses -- all sites I visited.)

The purpose of my post was simply to report/describe a bot apparently spoofing "msnbot" and inquire if Microsoft ever licensed same.

I guess that last part is still unknown.

thetrasher

12:27 pm on Feb 12, 2006 (gmt 0)

10+ Year Member



The purpose of my post was simply to report/describe a bot apparently spoofing "msnbot" and inquire if Microsoft ever licensed same.

I guess that last part is still unknown.


Report this "msnbot" to Microsoft. They should know if it is a legit/licensed msnbot or not. "MSN" is a trademark. Ask the trademark holder, don't ask the supposed abusers if they are abusers. Let MS lawyers do their work.

MSN Search Siteowner Support: [support.msn.com ]

wilderness

1:42 pm on Feb 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I guess that last part is still unknown.

Good luck with any confirmation from MS!

In early 2003 when somebody from MS began crawling sites anonymously ( [webmasterworld.com...] )
the participants in this forum never did veridy their identity.

There were a couple of other simialr threads regarding MS around that time as well.

Jordo needs a drink

4:54 pm on Feb 15, 2006 (gmt 0)

10+ Year Member



It hit my site a few days ago. It's not an MSN bot, more af a poorly coded bot. It did a lousy job with my relative links and got 404's left and right.

The actual MSN bot has no problems with my relative links.