homepage Welcome to WebmasterWorld Guest from 54.166.108.167
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
Hits from Microsoft are (apparently) stress testing my website/server
Stevietheman




msg:4161446
 4:39 pm on Jun 29, 2010 (gmt 0)

I have noticed lately how my server is being slowed down dramatically during certain parts of the day, and so I proceeded to look into the possibility of a DOS attack or particular heavy usage from some quarters.

What I learned was rather astonishing.

On one of my websites on the server, I track script processing time for each page load. Using this value, I could see what IPs were hitting the most when the processing times were the greatest.

A great number of the page loads in this review were coming from 207.46.*.* -- Microsoft. And a lot of these page loads weren't coming from their search spider, but rather apparently ordinary browser configurations. This seems to be consistent with various reports of Microsoft's various forms of testing websites.

What I then proceeded to do was compare the average script processing time for page loads coming from 207.46.*.* to pages loads coming from anywhere else.

For the last two months, the average proc time was:

MS: 2.299 secs.
Non-MS: 1.143 secs.

For the last month:

MS: 2.907 secs.
Non-MS: 1.46 secs.

For the last week, approximately:

MS: 6.078 secs.
Non-MS: 2.236 secs.

I really couldn't believe my eyes at these results.

Does anyone have ideas for strategies to deal with this beyond simply blocking Microsoft's IP range?

 

Stevietheman




msg:4161448
 4:56 pm on Jun 29, 2010 (gmt 0)

I also wanted to add that over the past month, less than 3% of search engine referrals to the site I tested came from Bing. I could live without those referrals. 92.5% came from Google alone.

Stevietheman




msg:4167740
 4:10 pm on Jul 10, 2010 (gmt 0)

Here's the results from my having blocked Microsoft.

The week before blocking, the average script processing time was 1.954 seconds.

The week after blocking, the number went down to 0.753 seconds.

A 61% decline in average script processing time!

Can Microsoft explain what they are doing to our sites?

Stevietheman




msg:4169330
 5:12 pm on Jul 13, 2010 (gmt 0)

Here's the complete blocking code I used in my .htaccess file:

deny from 207.46.
deny from 65.52.0.0/14

Hedgehog_UK




msg:4179883
 7:51 pm on Jul 31, 2010 (gmt 0)

I run a script on my weblogs. It helps me keep an eye on bots etc. Looking through the records, I first spotted this stealth bot visiting in Jan. During July, it took 20 times the bandwidth it took back then. It certainly eats enough copies of the robots.txt file to be a bot. As you said though, it always shows the UA of a browser rather than a bot.

It's lower profile, but there seems to be another block of theirs with similar activity on it.

Stevietheman




msg:4180392
 2:09 am on Aug 2, 2010 (gmt 0)

Thanks for following up!

Can you tell me the other range Microsoft is using?

AlexK




msg:4180396
 2:19 am on Aug 2, 2010 (gmt 0)

Either:
  1. Microsoft employees have free access to MSNBot IPs
    or
  2. Microsoft have been hacked. Or something.

Hedgehog_UK:
it always shows the UA of a browser rather than a bot

A `MSN-Bot' tripped the fast-scraper block [webmasterworld.com] on my forums (I'm the maintainer of that script). In order to do that, it took more than 14 pages in the space of 7 seconds - definitely a bot. Here are the stats on the (single) record:

    IP: 65.52.108.165
    Host lookup: msnbot-65-52-108-165.search.msn.com (checks out)
    Timing: 2010-08-02 02:09:33 (2 pages)
    UA: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Crazy Browser 1.0.5)


...which is NOT a bot UA.

Hedgehog_UK




msg:4181512
 12:14 am on Aug 4, 2010 (gmt 0)

14 pages in 7 seconds! It hasn't been that greedy on my site yet, though it has done 5 in 20 seconds. The UA details almost always start

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2

Occasionally NT 5.1 and a couple of times as MSIE 6.0
Other details vary.

If these are claiming to be browsers rather than bots, they have weird browsing habits. The log analysis script for my site is connected to the navigation database that controls the arrangement of links across the top of pages and down the left etc. During July, this stealth bot (or whatever it is) took enough pages to grab every page on the site at least twice. Yet it never took two pages in succession that were linked together directly by those navigation links. I haven't looked at figures for previous months.

Another odd thing, despite cache settings in the file headers, this visitor seems to take the stylesheet and java with almost every page - even when the requests are just a few seconds apart. It never seems to touch the graphics. It never even flags up a 304 to indicate the graphics haven't changed. On the other hand, if it did start taking the graphics as well - this could consume serious bandwidth.

In the site's robots.txt file, bots are banned from all subdirectories. So if this is a bot taking java etc, it's defying the robots.txt file.

Looking at the site as a whole, msnbot takes similar bandwidth to Googlebot. Adding in the figures for this visitor almost doubles that.

StevieTheMan
The other traffic I referred to is on the block AlexK referred to

65.52.0.0/14

In my case, between 65.52.104. to 65.52.108.

In this block it uses much less bandwidth, but is a mixture of msnbot requests and whatever lies behind these varying UAs. The requests are all but identical to what I described above. When claiming to be a visitor, again it takes the java and stylesheet with every single request, but never touches the graphics.

ByronM




msg:4187315
 3:05 pm on Aug 15, 2010 (gmt 0)

bing/msn bot has been fine on all of my sites..

Stevietheman




msg:4187321
 3:24 pm on Aug 15, 2010 (gmt 0)

ByronM, did you conduct a similar test to see what's going on under the hood of your site? I would imagine that low-traffic sites or sites that don't use heavy scripts wouldn't notice the problem. At any rate, it's a good idea to monitor what these bots are doing to your site, even if you don't notice an overall performance problem.

Sapo




msg:4231966
 3:23 pm on Nov 18, 2010 (gmt 0)

They are definitely misbehaving. Ignoring robots.txt is apparently just one of the bad things they're doing.

I added a new block this morning: 157.55.0.0/16

This MS address range has been reported as attempting to access phpmyadmin among other things.

koan




msg:4255053
 8:43 am on Jan 19, 2011 (gmt 0)

Got an invisible linked page that is a bot trap, disallowed in robots.txt. I cleaned up my htaccess file today, and so far in one day, msnbot visited that disallowed page about 10 times.

So hey, if you're having problem getting some pages indexed by Bing, just disallow them in the robots.txt file, they'll spend the day hammering them. Sigh.

koan




msg:4255969
 11:34 pm on Jan 20, 2011 (gmt 0)

Yeah did I say 10 times? Make that about 100 times now.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved