homepage Welcome to WebmasterWorld Guest from 54.198.130.203
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
msn bot has gone crazy
bwnbwn




msg:4534276
 1:28 pm on Jan 8, 2013 (gmt 0)

131.253.27.123
131.253.27.124
131.253.27.125

These are ip's from the msn bot. Over the weekend these 3 ip's were acting like a dns attack requesting 1000's of pages over and over. I finally had to block these ip's from the server. Has anybody else had the same problem from these IP's?

 

wilderness




msg:4534378
 5:54 pm on Jan 8, 2013 (gmt 0)

msbnot-media?

a recent thread [webmasterworld.com]

I replied that I'd had 40+requests in ten hour period for robots.txt.
The number was exceeded a few weeks later with 262 requests in a single 24-hour period and on a single site.

bwnbwn




msg:4534385
 6:02 pm on Jan 8, 2013 (gmt 0)

We had 1000's of repeated request so many so it was dragging the websites down. Example one site specific nitch might get 100 hits with 400 page views. This went from 400 to over 4k in 4 hours on Monday so I looked at the weekend. It started on Friday afternoon and never let up until I blocked the ip's. The website has only about 100 pages it was pulling the same content over and over and over.

wilderness




msg:4534399
 6:43 pm on Jan 8, 2013 (gmt 0)

msbnot-media?

bwnbwn




msg:4534436
 8:34 pm on Jan 8, 2013 (gmt 0)

msnbot-131-253-27-123.search.msn.com

wilderness




msg:4534462
 10:01 pm on Jan 8, 2013 (gmt 0)

Just add them to your robots.txt and although the requests for robots.txt will not stop, they will comply with your request and leave your images alone.

You will of course be required to take them off of denied access to read your robots.txt, unless you have an exception allow the reading of robots.txt for denied visitors.

lucy24




msg:4534475
 10:25 pm on Jan 8, 2013 (gmt 0)

I think he's asking what the UA was. msnbot-media, ordinary bingbot, or the dreaded plainclothes bingbot?

wilderness




msg:4534478
 10:37 pm on Jan 8, 2013 (gmt 0)

I think he's asking what the UA was


"forget about it" ;)

If he'd just provided a few lines of raw logs it would have been much easier.

My html crawls from the 131.253.x.x have been few.

The majority have been msnbot-media for images.

not2easy




msg:4534531
 5:00 am on Jan 9, 2013 (gmt 0)

Some apparent msnbots are being disavowed by Bing's verify tools where you end up if you try the URL attached to their bots. I am checking a few that seem to be naughty and Bing has disavowed 4 out of 5. I am adding the full info at the older thread mentioned above because that is where the rest of the info is at.

bwnbwn




msg:4534640
 1:32 pm on Jan 9, 2013 (gmt 0)

Sorry guys I was not asking what the UA was. I know it was msn bot. What I am seeking is has anyone had the bot act in such an aggresive behavior that it acted like a DNS attack on the server. I had all three IP's hitting at the same time requesting 40-60 pages a sec. So in effect the bots were requesting 100 pages a sec or just about the entire website only on this website. We have 100 other domains on the same server and none of them were hit.

Forget the robots.txt file I blocked them from the firewall.

wilderness




msg:4534669
 3:02 pm on Jan 9, 2013 (gmt 0)

What I am seeking is has anyone had the bot act in such an aggresive behavior that it acted like a DNS attack on the server.


The Bing/MSN bots have been "acting in an aggressive manner" on one of my sites for months, however NOT from the 131.253.2x range.
FWIW, I'd much rather have bot requests all grouped together in what might be deemed an aggressive manner. Their certainly easier to analyze in that order.

In fact, MSN/Bing is still requesting pages from the same site that haven't been online for three years.

If the requests are taking your server down, possibly other issues exist which are causing the overload.

bwnbwn




msg:4534726
 5:53 pm on Jan 9, 2013 (gmt 0)

thanks wilderness for your info. The sheer number of request from all three of the ip's was the issue. 3500 request on a 100 page website is in my eyes an attack.

keyplyr




msg:4534763
 8:47 pm on Jan 9, 2013 (gmt 0)


The Bing/MSN bots have been crawling every single page on my main site daily for over a year, sometime twice. For some reason they also sometimes inject a non-existent directory into otherwise valid file paths creating about a hundred daily 404s, day after day after day.

When I sent in logs showing them this, they just said it would eventually stop on its own. It hasn't.

wilderness




msg:4534776
 9:23 pm on Jan 9, 2013 (gmt 0)

they just said it would eventually stop on its own. It hasn't.


keyplr,
that's commonly referred to as "double talk".
In the old days, comprehension was best if the speaker was uttering the words from the side of their mouth.

lucy24




msg:4534808
 10:52 pm on Jan 9, 2013 (gmt 0)

In fact, MSN/Bing is still requesting pages from the same site that haven't been online for three years.

Based on behavior on my site, Bing-- unlike That Other Search Engine-- doesn't seem to distinguish between 404 and 410. Requests for 410 that are more than a month old are at least 99% Bing. And most of the rest are those casual robots that only stop by every year or so.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved