| 4:56 pm on Jun 29, 2010 (gmt 0)|
I also wanted to add that over the past month, less than 3% of search engine referrals to the site I tested came from Bing. I could live without those referrals. 92.5% came from Google alone.
| 4:10 pm on Jul 10, 2010 (gmt 0)|
Here's the results from my having blocked Microsoft.
The week before blocking, the average script processing time was 1.954 seconds.
The week after blocking, the number went down to 0.753 seconds.
A 61% decline in average script processing time!
Can Microsoft explain what they are doing to our sites?
| 5:12 pm on Jul 13, 2010 (gmt 0)|
Here's the complete blocking code I used in my .htaccess file:
deny from 207.46.
deny from 188.8.131.52/14
| 7:51 pm on Jul 31, 2010 (gmt 0)|
I run a script on my weblogs. It helps me keep an eye on bots etc. Looking through the records, I first spotted this stealth bot visiting in Jan. During July, it took 20 times the bandwidth it took back then. It certainly eats enough copies of the robots.txt file to be a bot. As you said though, it always shows the UA of a browser rather than a bot.
It's lower profile, but there seems to be another block of theirs with similar activity on it.
| 2:09 am on Aug 2, 2010 (gmt 0)|
Thanks for following up!
Can you tell me the other range Microsoft is using?
| 2:19 am on Aug 2, 2010 (gmt 0)|
- Microsoft employees have free access to MSNBot IPs
- Microsoft have been hacked. Or something.
|it always shows the UA of a browser rather than a bot |
A `MSN-Bot' tripped the fast-scraper block [webmasterworld.com] on my forums (I'm the maintainer of that script). In order to do that, it took more than 14 pages in the space of 7 seconds - definitely a bot. Here are the stats on the (single) record:
Host lookup: msnbot-65-52-108-165.search.msn.com (checks out)
Timing: 2010-08-02 02:09:33 (2 pages)
UA: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Crazy Browser 1.0.5)
...which is NOT a bot UA.
| 12:14 am on Aug 4, 2010 (gmt 0)|
14 pages in 7 seconds! It hasn't been that greedy on my site yet, though it has done 5 in 20 seconds. The UA details almost always start
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2
Occasionally NT 5.1 and a couple of times as MSIE 6.0
Other details vary.
If these are claiming to be browsers rather than bots, they have weird browsing habits. The log analysis script for my site is connected to the navigation database that controls the arrangement of links across the top of pages and down the left etc. During July, this stealth bot (or whatever it is) took enough pages to grab every page on the site at least twice. Yet it never took two pages in succession that were linked together directly by those navigation links. I haven't looked at figures for previous months.
Another odd thing, despite cache settings in the file headers, this visitor seems to take the stylesheet and java with almost every page - even when the requests are just a few seconds apart. It never seems to touch the graphics. It never even flags up a 304 to indicate the graphics haven't changed. On the other hand, if it did start taking the graphics as well - this could consume serious bandwidth.
In the site's robots.txt file, bots are banned from all subdirectories. So if this is a bot taking java etc, it's defying the robots.txt file.
Looking at the site as a whole, msnbot takes similar bandwidth to Googlebot. Adding in the figures for this visitor almost doubles that.
The other traffic I referred to is on the block AlexK referred to
In my case, between 65.52.104. to 65.52.108.
In this block it uses much less bandwidth, but is a mixture of msnbot requests and whatever lies behind these varying UAs. The requests are all but identical to what I described above. When claiming to be a visitor, again it takes the java and stylesheet with every single request, but never touches the graphics.
| 3:05 pm on Aug 15, 2010 (gmt 0)|
bing/msn bot has been fine on all of my sites..
| 3:24 pm on Aug 15, 2010 (gmt 0)|
ByronM, did you conduct a similar test to see what's going on under the hood of your site? I would imagine that low-traffic sites or sites that don't use heavy scripts wouldn't notice the problem. At any rate, it's a good idea to monitor what these bots are doing to your site, even if you don't notice an overall performance problem.
| 3:23 pm on Nov 18, 2010 (gmt 0)|
They are definitely misbehaving. Ignoring robots.txt is apparently just one of the bad things they're doing.
I added a new block this morning: 184.108.40.206/16
This MS address range has been reported as attempting to access phpmyadmin among other things.
| 8:43 am on Jan 19, 2011 (gmt 0)|
Got an invisible linked page that is a bot trap, disallowed in robots.txt. I cleaned up my htaccess file today, and so far in one day, msnbot visited that disallowed page about 10 times.
So hey, if you're having problem getting some pages indexed by Bing, just disallow them in the robots.txt file, they'll spend the day hammering them. Sigh.
| 11:34 pm on Jan 20, 2011 (gmt 0)|
Yeah did I say 10 times? Make that about 100 times now.