Forum Moderators: open
They do it again, I see hundreds of fake visitors from MSN IPs across all of my domains.
Are there any news what they try to accomplish by doing this?
6/24/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
65.55.104.64
msnbot-65-55-104-64.search.msn.com
[search.live.com...]
6/27/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
65.55.104.63
msnbot-65-55-104-63.search.msn.com
[search.live.com...]
dstiles: no, these are ok:
hxxp://by999w.bay999.mail.live.com/mail/InboxLight.aspx?FolderID=00000000-0000-0000-0000-000000000999&InboxSortAscending=False&InboxSortBy=Date&n=999[/
caribguy: thanks. I need to adapt the regex, though, as I'm using IIS, which is poor on clever stuff. Hence my question: can I block all live.com referers regardless.
Haven't seen any posting from the msn guy. Maybe he fell foul of the MS anti-publicity brigade?
UserAgents=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
I keep promising myself I'll dump the whole parcel of fake referers. It's just finding the time to test the block.
Have you checked your main site(s) on bing.com? I did earlier today and it's as if we've been dropped. All that appears is the bare URL, plus a few links blocked by robots.txt ('natch).
That's it. No title, no info, no doodad to explore the site, no nothing. One. Bare. URL. And that's after 2,372 hits by msnbot last month alone (not including msnbot's variations and MS's other bots), and for a long-authenticated, long-standing, 15,000- plus incoming links site.
And the msnbot hit count is also only a fraction of MS search-admin hits because the majority of the fake ref hits are done by cloaked UAs:
msnbot-65-55-104-67.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
Fake Ref: [search.live.com...]
In comparison, Googlebot hit the same site 1,704 times, and about 90% of the sitemap URLs appear for multiple keywords, and include titles, descriptions, etc.
Anyone else discover that when it comes to your site(s), Bing's nothing but the pits?
My main money site is in the #1 spot for all its keywords. Full description, pop-up thingy with more info and related links. The royal treatment!
MSN and its related-bots have been blocked from crawling that site for a week now.
ADDED: Did you notice msndude was on the site today. He posted in another thread [webmasterworld.com], but ignored this one. This one was far more important to us than the one he posted in.
[edited by: GaryK at 2:20 am (utc) on July 7, 2009]
Nearly 80% of my traffic comes from Google. Most of the rest comes from links on other sites. Bing, Yahoo! and a few others account for a few percent each.
And yet many days Bing would use more bandwidth than all the other crawlers combined. This despite Google being on the site almost constantly.
I'm inclined to think it's far too egotistical for me to consider I make any kind of difference at all to MSN/Live/Bing.
And yet, after announcing here I had blocked msnbot, every single one of my almost 400 sites got hit by the same kind of traffic as above. Every single day for the entire week. Sometimes multiple times per day.
Same odd-ball UAs. Same fake referrers. The major difference was not one of the IP Addresses had a rDNS. WhoIs revealed every one of them belonged to Microsoft.
I'm not even going to attempt posting them all. Here's a representative sample.
How the heck do I get rid msnbot now?
Is there any correlation between what's happening in this thread and the new thread Pfui started [webmasterworld.com]?
7/5/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; MS-RTC LM 8)
65.55.109.88
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
[search.live.com...]
---------------------------------------
7/7/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
65.55.109.186
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
[search.live.com...]
---------------------------------------
7/9/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30707; InfoPath.2)
65.55.109.182
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/10/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
65.55.110.82
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4325)
65.55.109.208
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/11/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.30729; .NET CLR 3.5.30729; MS-RTC LM 8)
65.55.109.242
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/11/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727)
65.55.107.202
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/11/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
65.55.110.66
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
---------------------------------------
7/11/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.40607)
65.55.110.110
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[search.live.com...]
And yet, after announcing here I had blocked msnbot, every single one of my almost 400 sites got hit by the same kind of traffic as above. Every single day for the entire week. Sometimes multiple times per day.
Gary,
Since this thread began (nearly four months ago) I've seen these partial searches arrive at my sites it batches. For weeks there are none and then suddenly they appear.
No rhyme or reason.
Bad News: After blocking the referrals for about a week, this morning (July 15) I discovered the same type of referrals coming from www.bing.com/search all in the IP range of 65.55.104.nnn. Examples:
7/15/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
65.55.104.66
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[bing.com...]
-----------------------------------------------------
7/15/2009
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4325; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648)
65.55.104.68
No PTR
-----
OrgName: Microsoft Corp
OrgID: MSFT
Address: One Microsoft Way
City: Redmond
StateProv: WA
-----
[bing.com...]
Is anyone else seeing this?
Thomas
What are they trying to achieve with this? Is it a case of any publicity will do, no matter how bad? Has someone forgotten to inform the techies that ref. spam is strictly for desperate slimeballs?
36 of the 39 new user agents I found are all made-up IE UAs. And all of them contain the same kind of fake search queries Thomas cited above. Except, and this is a big except:
Every single one is coming from a Qwest data center. And for whatever reason, Qwest, is contained within the UA string.
---
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 1.1.4322; .NET CLR 2.0.50727; Qwest 1.0; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; Qwest 1.0; MSN 9.0;MSN 9.1;MSN 9.6; MSNbQ002; MSN
97.117.86.110
97-117-86-110.slkc.qwest.net
[bing.com...]
---
I'm not sure what to make of this? An attempt to get around blocking perhaps? Cause it worked really well!
ADDED: I forgot to mention. The lack of a closing parenthesis in the UA is not my editing mistake. That's how it was in the log files.
I suspect the .slkc. is just a relay station (terminology?) similar to what some of the other providers (i. e., RR and some Verizon) use that do NOT provide designated sub-net assignments.
Qwest traffic for my sites has always been a real PITA. At one time, had every one of their ranges that I could locate denied.
Don
Donning my tinfoil hat, I'd say it might be a long-term plot to overshoot the default LimitRequestFieldsize and LimitRequestLine settings on Apache servers... I figure another seven years, and the MSIE/.NET CLR UA string will be too long for the default Apache input buffer size (8190 bytes) due to all of the accumulated .NET CLR and "MSN Optimized" updates. ;)
The Quest substring may be another customized MSIE version, provided by Qwest to its subscribers (Qwest is the former Mountain States Telephone Company, plus Northwestern and Pacific Northwest Bell). From the UA token order, it's likely that "Qwest IE" was installed from a Quest subscriber setup CD, and then "MSN Optimized" was later installed on top of that (after several additional .NET CLR updates).
Jim
We do not anticipate any problems related to our increasing emphasis on MSNBot 2, but the unexpected can’t always be avoided, no matter how hard you try! As such, we wanted to preemptively alert folks to the most effective way to report bot and crawling issues to Bing’s support team in case they arise.
Um, I think we've know about this issue and others for several months now.
FWIW, I keep a little stash of the longest UAs I see. Here's the hands-down winner, weighing in at a whopping 418 characters and spaces:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SIMBAR={4FF12884-DA94-11DD-9A55-0030052ADD26}; GTB5; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; Embedded Web Browser from: [bsalsa.com...] (R1 1.6); InfoPath.2; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET CLR 1.1.4322; .NET CLR 3.0.04506.30; OfficeLiveConnector.1.3; OfficeLivePatch.1.3; MSN OptimizedIE8;DEDE)
That came courtesy of a .de asking for -- wait for it -- /(null [webmasterworld.com])
Signed, Cranky Webmaster
Pfui, my top five longest UAs are: 896, 645, 587, 512, 480. Here's the top three:
896 characters:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB5; .NET CLR 1.1.4322; .NET CLR 2.0.50727; AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCC; BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDD
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD;
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCC; AAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB;
ASDFSDFFFFFFFFFFFFFFF)
645 characters:
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB5; SearchSystem8804412899; SearchSystem8926915604; SearchSystem5551447122; SearchSystem9881314576; SearchSystem3047846404; SearchSystem8500999806; SearchSystem2898910852; SearchSystem7960553825; SearchSystem9320878896; InfoPath.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.0.04506.648; SearchSystem8804412899; SearchSystem8926915604; SearchSystem5551447122; SearchSystem9881314576; SearchSystem3047846404; SearchSystem8500999806; SearchSystem2898910852; SearchSystem7960553825; SearchSystem9320878896)
587 characters:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0(Compatible Mozilla/4.0(Compatible-EmbeddedWB 14.59 [bsalsa.com...] EmbeddedWB- 14.59 from: [bsalsa.com...] ; Mozilla/4.0(Compatible Mozilla/4.0EmbeddedWB- 14.59 from: [bsalsa.com...] ; .NET CLR 1.1.4322; .NET CLR 2.0.50727; aff-kingsoft-ciba; staticlogin:product=cbpro09&act=login&info=ZmlsZW5hbWU9UG93ZXJ3b3JkMjAwOVByby4yNTI2OS4
0MDEwLmV4ZSZtYWM9QkI0OTc0RDJCNTFGNDUzNjkzQUI4MTE
xNkQ0MENDQzgmcGFzc3BvcnQ9JnZlcnNpb249MjAwOS4wNS4y
NS4yLjI4MyZjcmFzaHR5cGU9MQ==&verify=a3510b6c7b3fbc63bdee621ecfce0c5d; MAXTHON 2.0)
All three crashed my analysis program as they overflowed the max column width for user_agent! It's now set at varchar(1000)!
:)
As for MS's dozens of unidentified no-rDNS user-agents, I don't want them either. I may "pay for this" later, but at this point I really don't care any more -- They're simply too "high-maintenance" for anyone who runs a whitelist+rDNS-based access control system, and their 'bots have historically had too many problems for me to have much enthusiasm for them anymore. MS's "tide" agents don't even spoof their own MSIE browsers properly; They either add or omit spaces, or send the wrong HTTP-Accept headers, or put info for one header (or even another header's name!) into another header. These various "development" robots show a stunning lack of quality checking prior to being unleashed on the world. It is quite evident that either they have no internal 'standards' for Web agents or that if they do, there is very little to no standards-compliance testing.
Too bad for them, 'cause MSN/Live/Bing (and its users) will miss the good content that necessitated all my anti-scraper protection in the first place. Bing, bung, plonk... Have a 403.
On the UA-string length, I got one yesterday from an outfit that doesn't deserve any publicity that weighed in at 774 characters. That may not be the longest one I've ever received, but it was still a pretty nice catch -- I'm considering taking it to the taxidermist to have it mounted so I can hang it over my fireplace.
Jim