homepage Welcome to WebmasterWorld Guest from 54.204.79.235
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
MSN's Stealth Missions
Pfui




msg:4372256
 3:40 pm on Oct 8, 2011 (gmt 0)

Reports of stealth and abuse by MSN/Bing --

MSN's many cloaked bots. [webmasterworld.com...]
MSN's many cloaked bots. Again. [webmasterworld.com...]

-- keep getting a little long in the page count so here we go, again, after a brief recap of top problems...

1.) Cloaked / bare (no rDNS) IPs from (partial listing):

65.52.
65.54.
65.55.
157.55.
207.46.

2.) Atypical hit patterns like this now-common 'no UA, no robots.txt, no referrer, 11-hits-to-same-file' visit from 65.52.33.73:

15:45:09/dir/filename.html
15:45:20/dir/filename.html
15:45:31/dir/filename.html
15:45:42/dir/filename.html
15:45:53/dir/filename.html
15:46:03/dir/filename.html
15:46:14/dir/filename.html
15:46:25/dir/filename.html
15:46:35/dir/filename.html
15:46:46/dir/filename.html
15:46:57/dir/filename.html

3.) Atypical, 'unofficial' UAs from .search.msn.com domains akin to this morning's visit from:

msnbot-207-46-204-157.search.msn.com
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)

robots.txt? NO

Cloaked UAs include:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)

robots.txt? NO

And last but not least, the ongoing oddity:

msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

 

dstiles




msg:4372298
 7:36 pm on Oct 8, 2011 (gmt 0)

A while ago bingdude visited the bing forum hereabouts and I mentioned this. He promised to get it looked into. No recent activity from him so we can only assume bing has begun taking on the google policy of popping in once and then departing for ever. :(

g1smd




msg:4372299
 7:46 pm on Oct 8, 2011 (gmt 0)

I was hoping this lot was going to resolved several months ago.

I am very close to blocking the whole lot for good. When I have some spare time the trigger will be pulled.

Mokita




msg:4372363
 6:58 am on Oct 9, 2011 (gmt 0)

I banned the ones ending in )._ ages ago.

And I am already gradually banning the ones with 'unofficial' UAs, started with my high volume sites first.

Haven't decided about the "no rDNS" IPs yet.

Pfui




msg:4375491
 3:51 pm on Oct 17, 2011 (gmt 0)

During last night's wee hours, a cloaked IP and basically a scraper UA:

207.46.92.16
Wget/1.10.2

robots.txt? NO

Absolutely not okay.

dstiles




msg:4376198
 8:58 pm on Oct 18, 2011 (gmt 0)

Further to my posting 8th Oct above:

On that day I stickied bingdude asking him to reappear. Seems like my genie-invocation spell failed: no reply from him nor has he been seen in the Bing forum for a long time. Rapped knuckles for being helpful?

Gone the way of all the google visitors. Sad, I was beginning to have hope! :(

Pfui




msg:4376251
 11:05 pm on Oct 18, 2011 (gmt 0)

Thank you for trying. But you know, in the major-SE scheme of things, I reckon we're but fleas on the rumps of elephants: insignificant, annoying, and dependent on the ride.

Pfui




msg:4378278
 10:45 am on Oct 23, 2011 (gmt 0)

157.55.196.249
Firefox 7.0

robots.txt? NO

Pfui




msg:4378803
 6:45 pm on Oct 24, 2011 (gmt 0)

Ten minutes ago, again w/ Wget from a kin IP of the last Wget (207.46.92.16):

207.46.92.17
Wget/1.10.2

robots.txt? NO

Anyone else seeing any repeatedly/intentionally rogue UAs?

keyplyr




msg:4378870
 9:07 pm on Oct 24, 2011 (gmt 0)

Yes, I see variations of those all the time and have for years. I've dismissed them long ago and they're filtered from what I actually spend time following-up on; maybe naively but nonetheless.

The flags I used to research would take 2-3 hours every morning. I now filter the usual suspects (defined or not) and only spend time on the actual threats. Got it down to about an hour now :)

Pfui




msg:4384503
 7:30 pm on Nov 7, 2011 (gmt 0)

In a twitter swarm:

65.52.21.72
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)

robots.txt? NO

grandma genie




msg:4385147
 4:30 am on Nov 9, 2011 (gmt 0)

I'm getting the stealth visits too from microsoft:

207.46.204.162
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

robots.txt? NO

and

157.55.112.207
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

robots.txt? NO

Not acting anything like a bot.

Pfui




msg:4386368
 4:26 am on Nov 13, 2011 (gmt 0)

This just in. Note the (in)famous UA. Am amazed they're still using it:

msnbot-157-55-39-84.search.msn.com
msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

robots.txt? Yes

dstiles




msg:4386529
 8:28 pm on Nov 13, 2011 (gmt 0)

I still have a block on that UA. I thought they would have fixed it along with the DNS update but hadn't yet checked it. Ah, well.

incrediBILL




msg:4386922
 10:26 pm on Nov 14, 2011 (gmt 0)

They're hammering my server right now using the following UA:
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko)"

More details here: [webmasterworld.com...]

Pfui




msg:4387150
 2:18 pm on Nov 15, 2011 (gmt 0)

Exact same IP and UA as reported on 11-07 above, but not post-tweet this time. I give.

65.52.21.72
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)

robots.txt? NO

incrediBILL




msg:4387292
 7:00 pm on Nov 15, 2011 (gmt 0)

Has anyone noticed the irony of AppleWebKit coming from MS's search.msn.com IPs?

Just thought I'd point it out in case someone wasn't paying attention ;)

keyplyr




msg:4387325
 8:03 pm on Nov 15, 2011 (gmt 0)


Has anyone noticed the irony of AppleWebKit coming from MS's search.msn.com IPs?

It's joined the ranks of Mozilla. AppleWebKit is even used in the UA string of Android (their arch rival.) Guess it comes down to who was there first gets to name the mountain.

Staffa




msg:4387717
 7:38 pm on Nov 16, 2011 (gmt 0)

Sneaky little thing MSN, from a visit today

1n:24:33 /robots.txt - 157.55.17.192 - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
1n:25:05 /dir1/page1.asp - 157.55.17.192 - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
1n:25:07 /dir1/page1.asp - 207.46.204.164 - Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534~~(KHTML, like Gecko)

comes as bingbot and gets the page, two seconds later comes as "regular UA" and gets nothing.

Instead of using their resources on frivolities they should crawl as a proper bot and get peoples web sites indexed.

(ps : ~~ = double space)

dstiles




msg:4387761
 8:54 pm on Nov 16, 2011 (gmt 0)

I suspect the webkit UAs are doing something like google's web preview OR checking for viruses OR... Who knows? It's probably bot-ish but not pure bot.

Pity bingdude won't visit here. He's back in the Bing forum at present but for how long, who knows?

Pfui




msg:4402518
 2:24 am on Dec 31, 2011 (gmt 0)

Beats heck outta me what MSN was doing today with this laughable UA:

Mozilla/4.0 (compatible

msnbot-157-55-17-117.search.msn.com [projecthoneypot.org...]

1n:28:33 /dir/filename.html [302'd to...]
1n:28:34 /botbait/ [403]

robots.txt? NO

keyplyr




msg:4402524
 3:13 am on Dec 31, 2011 (gmt 0)

@ Pfui

Yup, I reported that a couple days ago:

[webmasterworld.com...]

Pfui




msg:4405216
 2:47 pm on Jan 9, 2012 (gmt 0)

'No UA, no robots.txt, no referrer, 11-hits-to-same-file' visits from Microsoft's Dynamic Hosting IPs now:

NetName: MICROSOFT-DYNAMIC-HOSTING
NetRange: 70.37.0.0 - 70.37.191.255
CIDR: 70.37.0.0/17, 70.37.128.0/18

During the first week of Jan., two days apart:

70.37.161.240
-
04:19:33 /dir/filename20.html
04:19:45 /dir/filename20.html
04:19:56 /dir/filename20.html
04:20:07 /dir/filename20.html
04:20:19 /dir/filename20.html
04:20:30 /dir/filename20.html
04:20:42 /dir/filename20.html
04:20:53 /dir/filename20.html
04:21:04 /dir/filename20.html
04:21:16 /dir/filename20.html
04:21:27 /dir/filename20.html


70.37.162.57
-
21:35:43 /dir/filename41.html
21:35:54 /dir/filename41.html
21:36:06 /dir/filename41.html
21:36:17 /dir/filename41.html
21:36:29 /dir/filename41.html
21:36:40 /dir/filename41.html
21:36:51 /dir/filename41.html
21:37:03 /dir/filename41.html
21:37:14 /dir/filename41.html
21:37:25 /dir/filename41.html
21:37:37 /dir/filename41.html

dstiles




msg:4405326
 8:01 pm on Jan 9, 2012 (gmt 0)

The MS equivalent of AWS? I've had the range 70.37.0.0 - 70.37.191.255 blocked for two years now.

Pfui




msg:4405417
 12:22 am on Jan 10, 2012 (gmt 0)

If only blocks stopped them from coming...

Pfui




msg:4406033
 10:45 pm on Jan 11, 2012 (gmt 0)

Oh, come on:

65.55.67.169
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; MSN 9.0;MSN 9.1;MSN 9.6;MSN 10.0;MSN 10.2; MSNbMSNI; MSNmen-us; MSNcIA)

robots.txt? NO
Referrer? YES

The referrer was legit. But the hit from a bare-IP MSN IP? Beats heck outta me. Employee, maybe -- 403'd because MSN plays fast and loose with its hordes, and hoards, of cloaked bots.

dstiles




msg:4410402
 8:01 pm on Jan 24, 2012 (gmt 0)

Came across a new (to me) MS IP range today...

204.231.192.0 - 204.231.223.255

One of the IPs was used as a proxy for an unspecified forwarding IP so the range could include proxies or it could be a "broadband" range (with local proxy/firewall). I could get no DNS information about the range, only the whois data.

lucy24




msg:4419463
 4:48 am on Feb 20, 2012 (gmt 0)

Continuing the theme of wtf-ness:

I'd assumed that in the course of January [webmasterworld.com] I got to know all the major robot players. Today during a routine check of Bing/MS IPs, which normally results in dead silence*, I ran smack into a pile of msnbots.

Nothing new about msnbot/2.0b-- and that's just the point. Its owners [onlinehelp.microsoft.com] say it's been put out to pasture, replaced by the bingbot.** The specialized msnbot-media is still on the job, but I had to go all the way back to May of 2011 for the last vanilla msnbot. What's up? Does the msnbot know something about the Social Security system that it's not telling? Was the MSN retirement package not all that it expected?

In the middle of the msnbots-- did it think it could hide?-- was a whole slew of msnbot-NewsBlogs (their plural). They too have been around for years; they're mentioned in assorted WebmasterWorld threads. I have never met one before. (Never = since April 2011 when I started saving raw logs.)

They made a total of 16 successful requests. Half were for robots.txt, always taken in pairs. The other half were for...

Let me backtrack here. For a long time I had one unusually fat file that was inordinately popular with the wrong kind of robots. It also got the occasional search-engine hit, most of them from humans who were clearly looking for something else. Wasted time and bandwidth on all sides. A couple weeks back I cut off the first 5% of the file and saved it under the name of the original fat version. The old one got tucked away behind a new name, a nofollow link and a noindex meta tag. If humans want to read the whole thing they're welcome. Robots can jolly well go on a diet.

The newly arrived blogbot read this slimmed-down file eight separate times.

The newly pulled-from-retirement msnbot puttered around here and there-- including a single serving of robots.txt-- presumably hoping I wouldn't notice when it, too, read the slim file twice... followed by the fat file.

Well, hey. It's not google. It doesn't have to pay attention to the "nofollow" directive. And the file's already indexed, so it's not like it's seeing anything it hasn't seen dozens of times before.


* Figure of speech. It's really the computer's "Bzzt!" sound meaning "Nope, nothing here." The bingbot and the msnbot-media have already been filtered out; the plainclothes bot is blocked.
** They also say, quote, "Bing does not share IP addresses for our crawlers." I'll trade you a 65\.5[2-5]\. for a 157\.(5[4-9]|60). Anyone got a spare 207\.46\.?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved