Forum Moderators: open

Message Too Old, No Replies

bingbot new crawl range

         

keyplyr

5:02 am on Oct 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




FYI - bingbot is now coming from: 40.64.0.0 - 40.127.255.255

This is not tagged as "crawl" at the WHOIS and I didn't have it allowed in my filters. I unsuspectingly thought it a poser for several days until I looked up the range (hence the 403):

40.77.167.67 - - [22/Oct/2015:19:26:52 -0700] "GET /example.gif HTTP/1.1" 403 982 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

So far just asking for image files.

dstiles

5:53 pm on Oct 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A reverse DNS of the complete 40.77/16 shows msnbot takes up all of and only 40.77.167.0/24

lucy24

9:35 pm on Oct 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wasn't there an earlier thread pointing this out from the other side? First there was Merck* selling 54 and duPont selling 52 piece by piece to Amazon; now it's Eli Lilly selling off 40.


* I remember looking this up once and finding that, contrary to expectation, Merck is not just doing fine but even paying dividends. So it's not because they're hungry; they're just discovering that they don't need anywhere near /8 for the foreseeable future.

keyplyr

8:14 am on Oct 25, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@ dstiles - Hmmm... I had posted a reply yesterday but the post either was taken down or it didn't complete, anyway I had other examples of bingbot (not msnbot) coming from various subs of 40.64.0.0/10. The exact crawl range is yet to be determined. Maybe they will change the registration to be more explicit at some point.

@ Lucy - Kinda like real estate isn't it. Some interest gets in there early, waits, then sells off bits & pieces as the demand/value increases; gives birth to the saying "if I knew then what I know now."

lucy24

7:27 pm on Oct 25, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Aaaand .... with today's logs featuring
40.77.167.59 - - [25/Oct/2015:06:48:17 -0700] "GET /robots.txt HTTP/1.1" 200 760 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 
40.77.167.59 - - [25/Oct/2015:06:48:22 -0700] "GET /paintings/paintingstyles.css HTTP/1.1" 200 3213 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
... I guess it's time to edit the Ignore section of log-wrangling code.

(Yes, that was the only file they asked for after robots.txt, on this very first visit from the new IP. Search me.)

Can't remember if they always did this, but last couple of days' logs suggest that they make a fresh robots.txt request for each specific IP. Tiny bit of leeway, so a.b.c.43 and a.b.c.44 might share, but beyond that it's a fresh request each time.

blend27

4:02 pm on Oct 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've seen 5 Robots.txt requests from 40.77.167.91(msnbot-40-77-167-91.search.msn.com) yesterday, which it obeyed,

BUT

there are 5 requests within 1 second of time, all 5 of them at 2015-10-28 06:07:19

wilderness

5:09 pm on Oct 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All that IP is requesting from my sites are robots and css.

lucy24

6:50 pm on Oct 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



robots and css.

Mine's mostly images-- pretty exactly 2/3 of the total to date. But a closer look reveals that all of those requests are images associated with a specific file (which this IP didn't ask for), so the pattern may be something entirely different.

Now this is interesting. I looked for something specific, but didn't know I'd actually find it:
207.46.13.137 - - [28/Oct/2015:03:56:24 -0700] "GET /robots.txt HTTP/1.1" 200 760 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 
207.46.13.137 - - [28/Oct/2015:03:57:27 -0700] "GET /ebooks/wedding/MouseWedding.html HTTP/1.1" 200 5997 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
40.77.167.39 - - [28/Oct/2015:03:57:41 -0700] "GET /robots.txt HTTP/1.1" 200 760 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
40.77.167.60 - - [28/Oct/2015:03:57:42 -0700] "GET /robots.txt HTTP/1.1" 200 760 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
40.77.167.40 - - [28/Oct/2015:03:57:43 -0700] "GET /ebooks/images/caribou-icon.png HTTP/1.1" 200 931 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
{etcetera for many but not all other images associated with firstnamed page}
I wondered if the page for all those images had been fetched by some other IP. It was. (Droll the way they had to fortify themselves with two robots.txt before embarking on such a long list of non-page requests.)

Corollary discovery:
In addition to the common bingbot UA
"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
there's also, far less often, the mobile bingbot
"Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
(I remember noting earlier that the mobile bingbot, unlike the mobile googlebot, also gets non-page files. Or at least stylesheets. Is it funny that, given a choice between mobile UAs, the bingbot plumped for iPhone over Android?)

Edit:
Images, stylesheets, js (just one). No pages at all so far.

blend27

12:41 pm on Nov 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



40.77.167.17 - msnbot-40-77-167-17.search.msn.com - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
40.77.167.17 - msnbot-40-77-167-17.search.msn.com - Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; http://www.bing.com/bingbot.htm)
40.77.167.22 - msnbot-40-77-167-22.search.msn.com - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
40.77.167.49 - msnbot-40-77-167-49.search.msn.com - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
40.77.167.54 - msnbot-40-77-167-54.search.msn.com - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
40.77.167.62 - msnbot-40-77-167-62.search.msn.com - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
40.77.167.63 - msnbot-40-77-167-63.search.msn.com - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
40.77.167.71 - msnbot-40-77-167-71.search.msn.com - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
40.77.167.82 - msnbot-40-77-167-82.search.msn.com - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
40.77.167.82 - msnbot-40-77-167-82.search.msn.com - Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; http://www.bing.com/bingbot.htm)
40.77.167.88 - msnbot-40-77-167-88.search.msn.com - Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

blend27

1:09 pm on Nov 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



When this become a none functional URL about Bing Bot?
http://www.bing.com/bingbot.htm

keyplyr

1:12 pm on Nov 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never seen that

blend27

3:52 pm on Nov 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I get "All Bing Help topics" page with links to everything but anything about BingBot. Initial request is redirected to [help.bing.microsoft.com...]

lucy24

7:51 pm on Nov 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Same here. Quite a slow redirect, too (but presumably server-side, or Firefox would have said something).

Huh. Did they truly and simply overlook it? The search box in that All Help Topics page is no use either.

faute de mieux, I found their Support/WMT area and sent off a "Do you realize...?" query.

Edit: The parallel URL
http://search.msn.com/msnbot.htm
redirects to the same page.

Angonasec

2:39 am on Nov 17, 2015 (gmt 0)



40.77.167.xx Bingbot behaves itself on our sites and crawls everything it is allowed to.