homepage Welcome to WebmasterWorld Guest from 54.237.235.12
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
MSN's Stealth Missions
Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 3:40 pm on Oct 8, 2011 (gmt 0)

Reports of stealth and abuse by MSN/Bing --

MSN's many cloaked bots. [webmasterworld.com...]
MSN's many cloaked bots. Again. [webmasterworld.com...]

-- keep getting a little long in the page count so here we go, again, after a brief recap of top problems...

1.) Cloaked / bare (no rDNS) IPs from (partial listing):

65.52.
65.54.
65.55.
157.55.
207.46.

2.) Atypical hit patterns like this now-common 'no UA, no robots.txt, no referrer, 11-hits-to-same-file' visit from 65.52.33.73:

15:45:09/dir/filename.html
15:45:20/dir/filename.html
15:45:31/dir/filename.html
15:45:42/dir/filename.html
15:45:53/dir/filename.html
15:46:03/dir/filename.html
15:46:14/dir/filename.html
15:46:25/dir/filename.html
15:46:35/dir/filename.html
15:46:46/dir/filename.html
15:46:57/dir/filename.html

3.) Atypical, 'unofficial' UAs from .search.msn.com domains akin to this morning's visit from:

msnbot-207-46-204-157.search.msn.com
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)

robots.txt? NO

Cloaked UAs include:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; WOW64; Trident/5.0)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)

robots.txt? NO

And last but not least, the ongoing oddity:

msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

 

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4372254 posted 7:36 pm on Oct 8, 2011 (gmt 0)

A while ago bingdude visited the bing forum hereabouts and I mentioned this. He promised to get it looked into. No recent activity from him so we can only assume bing has begun taking on the google policy of popping in once and then departing for ever. :(

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4372254 posted 7:46 pm on Oct 8, 2011 (gmt 0)

I was hoping this lot was going to resolved several months ago.

I am very close to blocking the whole lot for good. When I have some spare time the trigger will be pulled.

Mokita

5+ Year Member



 
Msg#: 4372254 posted 6:58 am on Oct 9, 2011 (gmt 0)

I banned the ones ending in )._ ages ago.

And I am already gradually banning the ones with 'unofficial' UAs, started with my high volume sites first.

Haven't decided about the "no rDNS" IPs yet.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 3:51 pm on Oct 17, 2011 (gmt 0)

During last night's wee hours, a cloaked IP and basically a scraper UA:

207.46.92.16
Wget/1.10.2

robots.txt? NO

Absolutely not okay.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4372254 posted 8:58 pm on Oct 18, 2011 (gmt 0)

Further to my posting 8th Oct above:

On that day I stickied bingdude asking him to reappear. Seems like my genie-invocation spell failed: no reply from him nor has he been seen in the Bing forum for a long time. Rapped knuckles for being helpful?

Gone the way of all the google visitors. Sad, I was beginning to have hope! :(

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 11:05 pm on Oct 18, 2011 (gmt 0)

Thank you for trying. But you know, in the major-SE scheme of things, I reckon we're but fleas on the rumps of elephants: insignificant, annoying, and dependent on the ride.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 10:45 am on Oct 23, 2011 (gmt 0)

157.55.196.249
Firefox 7.0

robots.txt? NO

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 6:45 pm on Oct 24, 2011 (gmt 0)

Ten minutes ago, again w/ Wget from a kin IP of the last Wget (207.46.92.16):

207.46.92.17
Wget/1.10.2

robots.txt? NO

Anyone else seeing any repeatedly/intentionally rogue UAs?

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4372254 posted 9:07 pm on Oct 24, 2011 (gmt 0)

Yes, I see variations of those all the time and have for years. I've dismissed them long ago and they're filtered from what I actually spend time following-up on; maybe naively but nonetheless.

The flags I used to research would take 2-3 hours every morning. I now filter the usual suspects (defined or not) and only spend time on the actual threats. Got it down to about an hour now :)

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 7:30 pm on Nov 7, 2011 (gmt 0)

In a twitter swarm:

65.52.21.72
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)

robots.txt? NO

grandma genie



 
Msg#: 4372254 posted 4:30 am on Nov 9, 2011 (gmt 0)

I'm getting the stealth visits too from microsoft:

207.46.204.162
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

robots.txt? NO

and

157.55.112.207
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)

robots.txt? NO

Not acting anything like a bot.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 4:26 am on Nov 13, 2011 (gmt 0)

This just in. Note the (in)famous UA. Am amazed they're still using it:

msnbot-157-55-39-84.search.msn.com
msnbot/2.0b (+http://search.msn.com/msnbot.htm)._

robots.txt? Yes

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4372254 posted 8:28 pm on Nov 13, 2011 (gmt 0)

I still have a block on that UA. I thought they would have fixed it along with the DNS update but hadn't yet checked it. Ah, well.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4372254 posted 10:26 pm on Nov 14, 2011 (gmt 0)

They're hammering my server right now using the following UA:
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko)"

More details here: [webmasterworld.com...]

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 2:18 pm on Nov 15, 2011 (gmt 0)

Exact same IP and UA as reported on 11-07 above, but not post-tweet this time. I give.

65.52.21.72
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)

robots.txt? NO

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4372254 posted 7:00 pm on Nov 15, 2011 (gmt 0)

Has anyone noticed the irony of AppleWebKit coming from MS's search.msn.com IPs?

Just thought I'd point it out in case someone wasn't paying attention ;)

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4372254 posted 8:03 pm on Nov 15, 2011 (gmt 0)


Has anyone noticed the irony of AppleWebKit coming from MS's search.msn.com IPs?

It's joined the ranks of Mozilla. AppleWebKit is even used in the UA string of Android (their arch rival.) Guess it comes down to who was there first gets to name the mountain.

Staffa

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4372254 posted 7:38 pm on Nov 16, 2011 (gmt 0)

Sneaky little thing MSN, from a visit today

1n:24:33 /robots.txt - 157.55.17.192 - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
1n:25:05 /dir1/page1.asp - 157.55.17.192 - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
1n:25:07 /dir1/page1.asp - 207.46.204.164 - Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534~~(KHTML, like Gecko)

comes as bingbot and gets the page, two seconds later comes as "regular UA" and gets nothing.

Instead of using their resources on frivolities they should crawl as a proper bot and get peoples web sites indexed.

(ps : ~~ = double space)

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4372254 posted 8:54 pm on Nov 16, 2011 (gmt 0)

I suspect the webkit UAs are doing something like google's web preview OR checking for viruses OR... Who knows? It's probably bot-ish but not pure bot.

Pity bingdude won't visit here. He's back in the Bing forum at present but for how long, who knows?

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 2:24 am on Dec 31, 2011 (gmt 0)

Beats heck outta me what MSN was doing today with this laughable UA:

Mozilla/4.0 (compatible

msnbot-157-55-17-117.search.msn.com [projecthoneypot.org...]

1n:28:33 /dir/filename.html [302'd to...]
1n:28:34 /botbait/ [403]

robots.txt? NO

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4372254 posted 3:13 am on Dec 31, 2011 (gmt 0)

@ Pfui

Yup, I reported that a couple days ago:

[webmasterworld.com...]

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 2:47 pm on Jan 9, 2012 (gmt 0)

'No UA, no robots.txt, no referrer, 11-hits-to-same-file' visits from Microsoft's Dynamic Hosting IPs now:

NetName: MICROSOFT-DYNAMIC-HOSTING
NetRange: 70.37.0.0 - 70.37.191.255
CIDR: 70.37.0.0/17, 70.37.128.0/18

During the first week of Jan., two days apart:

70.37.161.240
-
04:19:33 /dir/filename20.html
04:19:45 /dir/filename20.html
04:19:56 /dir/filename20.html
04:20:07 /dir/filename20.html
04:20:19 /dir/filename20.html
04:20:30 /dir/filename20.html
04:20:42 /dir/filename20.html
04:20:53 /dir/filename20.html
04:21:04 /dir/filename20.html
04:21:16 /dir/filename20.html
04:21:27 /dir/filename20.html


70.37.162.57
-
21:35:43 /dir/filename41.html
21:35:54 /dir/filename41.html
21:36:06 /dir/filename41.html
21:36:17 /dir/filename41.html
21:36:29 /dir/filename41.html
21:36:40 /dir/filename41.html
21:36:51 /dir/filename41.html
21:37:03 /dir/filename41.html
21:37:14 /dir/filename41.html
21:37:25 /dir/filename41.html
21:37:37 /dir/filename41.html

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4372254 posted 8:01 pm on Jan 9, 2012 (gmt 0)

The MS equivalent of AWS? I've had the range 70.37.0.0 - 70.37.191.255 blocked for two years now.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 12:22 am on Jan 10, 2012 (gmt 0)

If only blocks stopped them from coming...

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4372254 posted 10:45 pm on Jan 11, 2012 (gmt 0)

Oh, come on:

65.55.67.169
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; MSN 9.0;MSN 9.1;MSN 9.6;MSN 10.0;MSN 10.2; MSNbMSNI; MSNmen-us; MSNcIA)

robots.txt? NO
Referrer? YES

The referrer was legit. But the hit from a bare-IP MSN IP? Beats heck outta me. Employee, maybe -- 403'd because MSN plays fast and loose with its hordes, and hoards, of cloaked bots.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4372254 posted 8:01 pm on Jan 24, 2012 (gmt 0)

Came across a new (to me) MS IP range today...

204.231.192.0 - 204.231.223.255

One of the IPs was used as a proxy for an unspecified forwarding IP so the range could include proxies or it could be a "broadband" range (with local proxy/firewall). I could get no DNS information about the range, only the whois data.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4372254 posted 4:48 am on Feb 20, 2012 (gmt 0)

Continuing the theme of wtf-ness:

I'd assumed that in the course of January [webmasterworld.com] I got to know all the major robot players. Today during a routine check of Bing/MS IPs, which normally results in dead silence*, I ran smack into a pile of msnbots.

Nothing new about msnbot/2.0b-- and that's just the point. Its owners [onlinehelp.microsoft.com] say it's been put out to pasture, replaced by the bingbot.** The specialized msnbot-media is still on the job, but I had to go all the way back to May of 2011 for the last vanilla msnbot. What's up? Does the msnbot know something about the Social Security system that it's not telling? Was the MSN retirement package not all that it expected?

In the middle of the msnbots-- did it think it could hide?-- was a whole slew of msnbot-NewsBlogs (their plural). They too have been around for years; they're mentioned in assorted WebmasterWorld threads. I have never met one before. (Never = since April 2011 when I started saving raw logs.)

They made a total of 16 successful requests. Half were for robots.txt, always taken in pairs. The other half were for...

Let me backtrack here. For a long time I had one unusually fat file that was inordinately popular with the wrong kind of robots. It also got the occasional search-engine hit, most of them from humans who were clearly looking for something else. Wasted time and bandwidth on all sides. A couple weeks back I cut off the first 5% of the file and saved it under the name of the original fat version. The old one got tucked away behind a new name, a nofollow link and a noindex meta tag. If humans want to read the whole thing they're welcome. Robots can jolly well go on a diet.

The newly arrived blogbot read this slimmed-down file eight separate times.

The newly pulled-from-retirement msnbot puttered around here and there-- including a single serving of robots.txt-- presumably hoping I wouldn't notice when it, too, read the slim file twice... followed by the fat file.

Well, hey. It's not google. It doesn't have to pay attention to the "nofollow" directive. And the file's already indexed, so it's not like it's seeing anything it hasn't seen dozens of times before.


* Figure of speech. It's really the computer's "Bzzt!" sound meaning "Nope, nothing here." The bingbot and the msnbot-media have already been filtered out; the plainclothes bot is blocked.
** They also say, quote, "Bing does not share IP addresses for our crawlers." I'll trade you a 65\.5[2-5]\. for a 157\.(5[4-9]|60). Anyone got a spare 207\.46\.?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved