Forum Moderators: open
They do it again, I see hundreds of fake visitors from MSN IPs across all of my domains.
Are there any news what they try to accomplish by doing this?
I'm considering taking it to the taxidermist to have it mounted so I can hang it over my fireplace.
whitelist+rDNS-based access control system
posts:20804
The problem I have with using rDNS in real-time is some bots, like the MS ones, simply hit my sites too quickly and too often. And as a result, repeated rDNS lookups would, I think, bog things down worse than just letting them have at it.
You don't have to rdns all the time. You can do it once for an IP and store the info temporarily. To get around the problems of the msnbots I've setup a whitelist for the ip range.
This UA attempted two add another Class C and two more Class D's.
Utter nonsense.
That robots.txt should be violated (even with later 403's,) by direct requests for images.
65.55.106.220 - - [30/Jul/2009:15:36:30 +0100] "GET /robots.txt HTTP/1.1" 200 4858 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.105.22 - - [30/Jul/2009:16:25:29 +0100] "GET /robots.txt HTTP/1.1" 200 4858 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
65.55.106.135 - - [30/Jul/2009:16:25:49 +0100] "GET /robots.txt HTTP/1.1" 200 4858 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.144 - - [30/Jul/2009:16:47:49 +0100] "GET /robots.txt HTTP/1.1" 200 4858 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.231.99 - - [30/Jul/2009:16:59:04 +0100] "GET /robots.txt HTTP/1.1" 403 1159 "-" "Mozilla/4.0"
65.55.231.99 - - [30/Jul/2009:16:59:04 +0100] "GET /jib/1136.jpg HTTP/1.1" 403 - "-" "Mozilla/4.0"
65.55.106.139 - - [30/Jul/2009:17:22:13 +0100] "GET /robots.txt HTTP/1.1" 200 4858 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.231.121 - - [30/Jul/2009:17:22:46 +0100] "GET /robots.txt HTTP/1.1" 403 1159 "-" "Mozilla/4.0"
65.55.231.121 - - [30/Jul/2009:17:22:46 +0100] "GET /Dir/SubDir/image01.jpg HTTP/1.1" 403 - "-" "Mozilla/4.0"
65.55.106.159 - - [30/Jul/2009:17:43:01 +0100] "GET /robots.txt HTTP/1.1" 200 4858 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.231.44 - - [30/Jul/2009:18:16:12 +0100] "GET /robots.txt HTTP/1.1" 403 1159 "-" "Mozilla/4.0"
65.55.231.44 - - [30/Jul/2009:18:16:12 +0100] "GET /Dir/SubDir2/Image02.jpg HTTP/1.1" 403 - "-" "Mozilla/4.0"
Don't recall the saying?
Something about "the arms not knowing what the legs are doing" or "the brain not knowing what the head is doing"?
Can't wait till they take on Yahoo ;)
131.107. been going on since 2003, although not near as bas as they once were.
2004:
207.46.98.60 - - [30/Aug/2004:07:28:25 -0700] "GET /MyFolder/MyPage.html
HTTP/1.0" 200 10097 "-" "msnbot/0.11 (+http://search.msn.com/msnbot.htm)
2005:
207.46.127.166 - - [30/May/2009:23:53:08 +0100] "GET /robots.txt HTTP/1.1" 200 4777 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; livebot-searchsense/0.1; +http://search.msn.com/msnbot.htm)"
If there's anything recent from the 207.46., I don't keep track of it.
BTW, I've been seeing variations on this bot since at least 2000. The one you posted first visited me in October 2007.
MSRBOT
MSRBOT (http://research.microsoft.com/research/sv/msrbot)
MSRBOT (http://research.microsoft.com/research/sv/msrbot/
MSRBOT (http://research.microsoft.com/research/sv/msrbot/)
MSRBOT/0.1
MSRBOT/0.1 (http://research.microsoft.com/research/sv/msrbot/)
Are you suggesting this was intentional?
When we began seeing three or more of these net updates, we wondered if the end was ever in sight?
Count seven!
.NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Precisely.
Count seven!
OK, but intentionally targeting you, or webmasters in general? Cause I don't see how they could be targeting anyone here specifically.
Gary,
Once upon a time this forum was quite active and un-moderated.
Participants could add information and many other participants (lurkers; good and bad) were able to take immediate action.
Somewhere, I've a link to a thread saved in which I offered an example of a bot and within minutes, the bot (lurking in this forum) mass crawled (or at least attempted) one of my sites, based on the criteria I had explained.
Thus it's not so far fetched to assume that MS has the capability to "play Shamus" the same as your or I.
Too assume otherwise IMO, is simply naive. Whether anybody at MS actually has the time for such "shamus" antics is another issue.
(This is NOT meant to "dis Bill" as he's doing a wonderful job of quickly approving the moderated submissions).
This is akin to having Firefox publish its user-agent string like
"Mozilla/5.0 blah-blah-OS-blah-blah Phoenix/0.1 Phoenix/0.2, Phoenix/0.3, Firebird/0.4, Firefox/0.0.1, Firefox/0.0.2, Firefox/0.0.3," etc., etc., etc. -- At some point, you need to do a roll-up, enforce compatibility, and quit publishing an ever-lengthening UA string.
Jim
Targeting? No-ones being targeted by anybody, this is just Microsoft updating the .NET runtime library and being very sloppy about the updates, in that they really should use a backwards-compatibility system that doesn't require every single update to appear in the UA, just in case a .NET server needs to check version compatibility by looking at the UA string.
Jim,
My apologies for the confusion.
Gary and I were discussing the possible likelihood of the 207.46. Research bot being targeted after I have mentioned the Class B, previously.
I added the NET update UA's as a footnote.
Don
ADDED: Don, thanks for your private note. I see what you're referring to. But I didn't see any reference to your site(s). So I stand by what I've said. :)
Eventually these overlooked submissions would have been edited by myself or the forum moderators ( I recall making a request to Brett one time for an edit that was beyond the allowed time frame by a user).
It's not unreasonable that some simply overlooked submissions (contain either domain names or page names) simply slipped through the cracks and exist still today.
The possibility of gathering a participants identity (domain or otherwise) is simply NOT as impossible as your attempting to make it appear.
It's certainly possible for someone persistent enough to find your sites and at least one of mine based on publicly available information right here on WebmasterWorld.
Although I'm still not sure how likely it is.
I have to ask myself, why would anyone at MS want to send me, an insignificant webmaster, any kind of message.
Especially when we can't even get the likes of msndude to keep his promises to address the primary topic of this thread that's been ongoing since I think March 20, 2009.
[bing.com...]
In it the guy who posted here as ms_dude assures people "I am working with the crawler group to get this figured out. I'll let you know as soon as I get more information." He posted that less than a week ago, almost four months after saying the same thing here, and makes it seem there like it is some new problem just being brought to their attention.
I just can't figure out whether it is incompetence or some devious scheme to inflate the statistics on usage of Bing. In any case, standard practices would require that msnbot ip addresses not be used by web crawlers that do not identify themselves in the User Agent field as web crawlers, and that they not fake the referer field. It is just like Microsoft to mess with our web stats by breezily violating standard practices for their own convenience.
# block access from buggy MSN bot
RewriteCond %{REMOTE_ADDR} ^65\.5[2-5]\.
RewriteCond %{HTTP_REFERER} ^http://www\.bing\.com/search [NC]
RewriteRule .* - [F]
Notice that the .htaccess lines I posted only block access that looks like someone clicking on a Bing search result for my site when they are browsing from an ip address that is supposed to be the msnbot. It will not block real people who find my site in a Bing search. It will not block the msnbot when it crawls through identifying itself as the msnbot. It only blocks access that should never occur in the first place. If Microsoft chooses to blacklist sites who do this, well I would rather not have them distort my stats and increase my bandwidth costs for no good reason. I'll accept that my site will be found by the vast majority of web users who choose to use Google. If enough webmasters shared my attitude then Microsoft would have to start following accepted standard behavior or else fall even farther behind Google as Bing fails to index more and more sites that will not stand for this.
I just found yet another thread in the Bing community forum
[bing.com...]
where someone asked yet again about these one-word Bing searches from 65.55.* in their access logs and the same ms_dude Brett guy answered yet again that it must be a glitch in the new robot they are testing and please send him details so he can relay it to the crawler team. This same behavior has been going on since at least 2007. Gimme a break!
Whatever percentage is that traffic, that is faked referals from Microsoft. If it is a significant part of the 42%, that is exactly what we are complaining about.
So your site would be a good example and I'm curious as to what your numbers are. Can you easily come up with the percentage for, say the last month, if you separate out the 65.5?.* ip addresses from the Bing/Live referals?
Here is the key question that would help in understanding this whole discussion... Of the 42% referrals for Bing/Live, how many are from 65.5[2-5].* ip addresses?
MS (nor any other SE) provide such stats.
I kinda doubt MS themselves could even trace what bot IP range their end result data (SERPS) is a result of.
if you separate out the 65.5?.* ip addresses from the Bing/Live referals?
Of the 42% referrals, 1.4% are from the 65.5* range. Small potatoes for this site, (about 4M hits/year) which is one reason why I'm not hot and bothered.
I CAN see that if hit/bandwidth was a mil a month or greater it might make a difference. After all, it is a numbers game. For me, my numbers more than comfortably fit in my host choice.
PS. Quick numbers reply is because I dump my logs for this site into Access each week and run custom reports.
The commercial sites are managed with whitelisting to keep customers happy and when bad behavior bots are found, they are NUKED dead dead dead. Keeps customer happy (save bandwidth) and I don't feel bad charging monthly service fees. :)
edit: I see about the same percentage of 65.5* on those sites as well. Haven't put the brickbat to them yet. end edit.
Looking in detail in both Google Analytics and my hosting provider's Webalizer reports for my site, it appears that web crawler ip addresses are filtered out, so these hits don't show. That makes it even less important as an annoyance. It still bothers me that Microsoft will keep doing this and will continue to in effect lie about it in tech support forums.
http:// www bing.com/search?q=keyword
Note how the cloaked IP+fake UA sounds out the site, then the only official msnbot Host+UA requests a specific directory (x3!), then the 2 cloaked UAs request and fake-ref the exact same directory.
-----
65.55.217.43
Mozilla/4.0
robots.txt? YES
Fake ref? NO
Hits: 1
-----
msnbot-65-55-104-163.search.msn.com
msnbot/1.1 (+http://search.msn.com/msnbot.htm)
robots.txt? NO
Fake ref? NO
Dir req: /keyword
Hits: 3
-----
msnbot-65-55-104-70.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; InfoPath.2)
robots.txt? NO
Fake ref? YES
Dir req: /keyword
Hits: 1
-----
msnbot-65-55-104-60.search.msn.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SLCC1; .NET CLR 1.1.4325; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
robots.txt? NO
Fake ref? YES
Dir req: /keyword
Hits: 1
Spare me.