homepage Welcome to WebmasterWorld Guest from 54.161.214.221
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
msn bot has lost its mind
1000+ parallel connections
amznVibe




msg:4607278
 3:39 pm on Sep 4, 2013 (gmt 0)

I keep getting alerts about msnbot - and it really is msnbot from a real microsoft ip - it's opening 1000+ parallel connections to the server at the same time.

What in the heck do they think they are doing? This is not acceptable.

Note it's all from the same single IP, and I do not mean it is re-requesting the same 1000 files over time, I mean boom, same single second, 1000+ connections attempted.

First saw it from

131.253.38.xx (CA/Canada/msnbot-131-253-38-xx.search.msn.com)

and then from a few different US ips

65.55.213.xx (US/United States/msnbot-65-55-213-xx.search.msn.com)

Never ever, had this problem with google.

Anyone else notice this kind of activity?

 

incrediBILL




msg:4607409
 11:52 pm on Sep 4, 2013 (gmt 0)

Maybe they're experimenting with rapid indexing to see if some sites can handle it and how fast can they take it.

Perhaps it's a bug nobody at MSN knows about.

I've sent them log files before when they behaved badly and they fixed the problem so perhaps you should consider that.

When bot owners ignore my polite requests then I blog and post the data for all to see and tweet about it and get retweets and after publicly embarrassing them they often fix the problem.

lucy24




msg:4607430
 1:02 am on Sep 5, 2013 (gmt 0)

Do they honor the Crawl-Delay directive? I know Google doesn't-- you have to set it in wmt-- but I'm ### if I can find the area in Bing wmt that analyzes your robots.txt.

BillyS




msg:4607434
 1:16 am on Sep 5, 2013 (gmt 0)

I have a server setup that trips if an IP establishes more than 11 simultaneous connections. MSNbot gets blocked all the time...

phranque




msg:4607436
 2:05 am on Sep 5, 2013 (gmt 0)

Does BingBot honor the Crawl-delay directive?:
http://www.bing.com/blogs/site_blogs/b/webmaster/archive/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbot-s-question.aspx [bing.com]

lucy24




msg:4607443
 3:53 am on Sep 5, 2013 (gmt 0)

On every forum there is one person who always knows where to find things. On WebmasterWorld, that person is phranque :)

Because it would cause a lot of unwanted traffic if BingBot tried to fetch your robots.txt file every single time it wanted to crawl a page on your website, it keeps your directives in memory for a few hours.

Someone remind me: Why don't these forums have a "roflmfao" emoticon?

1000+ parallel connections

You've got a sturdier server than mine :o I'm on shared hosting and I think the ceiling is 30. I've only ever seen it with malicious robots.

:: shuffling papers ::

Yup. Ghastly robot from {server farm} back in February 2012, slew of 503 responses with log message
access to {filename} failed for {IP}, reason: Client exceeded concurrent connection limit of 30, referer: {referer}
Even WebReaper doesn't do that. Certainly not what you'd expect of the bingbot.

Unless all 1000+ concurrent requests were for robots.txt. That I'd believe.

BillyS




msg:4607510
 11:27 am on Sep 5, 2013 (gmt 0)

You can also set the crawl rate in Microsoft's webmaster tool. For example, you could tell it to crawl less aggressively during peak periods.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved