wilderness

msg:4534378 | 5:54 pm on Jan 8, 2013 (gmt 0) |
msbnot-media? a recent thread [webmasterworld.com] I replied that I'd had 40+requests in ten hour period for robots.txt. The number was exceeded a few weeks later with 262 requests in a single 24-hour period and on a single site.
|
bwnbwn

msg:4534385 | 6:02 pm on Jan 8, 2013 (gmt 0) |
We had 1000's of repeated request so many so it was dragging the websites down. Example one site specific nitch might get 100 hits with 400 page views. This went from 400 to over 4k in 4 hours on Monday so I looked at the weekend. It started on Friday afternoon and never let up until I blocked the ip's. The website has only about 100 pages it was pulling the same content over and over and over.
|
wilderness

msg:4534399 | 6:43 pm on Jan 8, 2013 (gmt 0) |
msbnot-media?
|
bwnbwn

msg:4534436 | 8:34 pm on Jan 8, 2013 (gmt 0) |
msnbot-131-253-27-123.search.msn.com
|
wilderness

msg:4534462 | 10:01 pm on Jan 8, 2013 (gmt 0) |
Just add them to your robots.txt and although the requests for robots.txt will not stop, they will comply with your request and leave your images alone. You will of course be required to take them off of denied access to read your robots.txt, unless you have an exception allow the reading of robots.txt for denied visitors.
|
lucy24

msg:4534475 | 10:25 pm on Jan 8, 2013 (gmt 0) |
I think he's asking what the UA was. msnbot-media, ordinary bingbot, or the dreaded plainclothes bingbot?
|
wilderness

msg:4534478 | 10:37 pm on Jan 8, 2013 (gmt 0) |
| I think he's asking what the UA was |
| "forget about it" ;) If he'd just provided a few lines of raw logs it would have been much easier. My html crawls from the 131.253.x.x have been few. The majority have been msnbot-media for images.
|
not2easy

msg:4534531 | 5:00 am on Jan 9, 2013 (gmt 0) |
Some apparent msnbots are being disavowed by Bing's verify tools where you end up if you try the URL attached to their bots. I am checking a few that seem to be naughty and Bing has disavowed 4 out of 5. I am adding the full info at the older thread mentioned above because that is where the rest of the info is at.
|
bwnbwn

msg:4534640 | 1:32 pm on Jan 9, 2013 (gmt 0) |
Sorry guys I was not asking what the UA was. I know it was msn bot. What I am seeking is has anyone had the bot act in such an aggresive behavior that it acted like a DNS attack on the server. I had all three IP's hitting at the same time requesting 40-60 pages a sec. So in effect the bots were requesting 100 pages a sec or just about the entire website only on this website. We have 100 other domains on the same server and none of them were hit. Forget the robots.txt file I blocked them from the firewall.
|
wilderness

msg:4534669 | 3:02 pm on Jan 9, 2013 (gmt 0) |
| What I am seeking is has anyone had the bot act in such an aggresive behavior that it acted like a DNS attack on the server. |
| The Bing/MSN bots have been "acting in an aggressive manner" on one of my sites for months, however NOT from the 131.253.2x range. FWIW, I'd much rather have bot requests all grouped together in what might be deemed an aggressive manner. Their certainly easier to analyze in that order. In fact, MSN/Bing is still requesting pages from the same site that haven't been online for three years. If the requests are taking your server down, possibly other issues exist which are causing the overload.
|
bwnbwn

msg:4534726 | 5:53 pm on Jan 9, 2013 (gmt 0) |
thanks wilderness for your info. The sheer number of request from all three of the ip's was the issue. 3500 request on a 100 page website is in my eyes an attack.
|
keyplyr

msg:4534763 | 8:47 pm on Jan 9, 2013 (gmt 0) |
The Bing/MSN bots have been crawling every single page on my main site daily for over a year, sometime twice. For some reason they also sometimes inject a non-existent directory into otherwise valid file paths creating about a hundred daily 404s, day after day after day. When I sent in logs showing them this, they just said it would eventually stop on its own. It hasn't.
|
wilderness

msg:4534776 | 9:23 pm on Jan 9, 2013 (gmt 0) |
| they just said it would eventually stop on its own. It hasn't. |
| keyplr, that's commonly referred to as "double talk". In the old days, comprehension was best if the speaker was uttering the words from the side of their mouth.
|
lucy24

msg:4534808 | 10:52 pm on Jan 9, 2013 (gmt 0) |
| In fact, MSN/Bing is still requesting pages from the same site that haven't been online for three years. |
| Based on behavior on my site, Bing-- unlike That Other Search Engine-- doesn't seem to distinguish between 404 and 410. Requests for 410 that are more than a month old are at least 99% Bing. And most of the rest are those casual robots that only stop by every year or so.
|
|