Bingbot UA from 13.64-107

Forum Moderators: open

Message Too Old, No Replies

Bingbot UA from 13.64-107

keyplyr

8:08 pm on Jun 28, 2018 (gmt 0)

UA: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Protocol: HTTP/1.1
Robots.txt: No
Host: Microsoft Corporation
13.64.0.0 - 13.107.255.255
13.64.0.0/11, 13.104.0.0/14, 13.96.0.0/13

Multiple requests, spaced far apart, for same page only. Since this is not a designated crawl range for bingbot, the requests were denied with a 403. I have now allowed this range to test. This could be verification of a business listing with Bing.

jmccormac

8:54 pm on Jun 28, 2018 (gmt 0)

Cloud range?

Regards...jmcc

keyplyr

9:17 pm on Jun 28, 2018 (gmt 0)

Not designated as a cloud range. I would have noted it. Not Azure either (MS version of AWS.)

The range is a long time, well known MS range. I might have seen bingbot coming from there before, not sure. What drew my attention is that I blocked it and that it was asking for the same personal info page about 20 times over a couple hours. This leads me to think it might be one of Bing's tools, possibly related to their Bing for Business utility.

Like I said, I'm testing to see if I can glean any further info.

blend27

12:30 pm on Jun 29, 2018 (gmt 0)

Yep, just hit my sites.

But No RDNS = No Content for now.

In one case on the same site 2 different URLs, 3 times each over an hour time span on an ECom site.

Very old URLs, single product listing with high SERP position for the past several years for each.

No Robots.txt requests from that range though.

403d.

lucy24

6:43 pm on Jun 29, 2018 (gmt 0)

Aside from the UA, did you happen to notice if its headers are identical to the �real� bingbot?

I�ve known robots who crawled from a variety of IPs but only used selected ones for robots.txt request, so this factor in & of itself isn�t entirely dispositive.

keyplyr

8:17 pm on Jun 29, 2018 (gmt 0)

I didn't dig in and compare headers with the bingbot from verified crawl ranges, but it did clear my checks so there must not have been anything abnormal (or Abby Normal as Marty Feldman says.)

iamlost

12:31 am on Jun 30, 2018 (gmt 0)

As blend27: rDNS failure gets 403. If it's the real deal Bing should know better; if not...

keyplyr

12:35 am on Jun 30, 2018 (gmt 0)

rDNS shows Microsoft as documentated in OP.

iamlost

3:29 am on Jun 30, 2018 (gmt 0)

OP shows MSFT as host and that means diddly squat. rDNS mostly doesn't (didn't) resolve and even if it did unless it resolved to search.msn.com it ain't what's on the label.

keyplyr

3:38 am on Jun 30, 2018 (gmt 0)

From the OP

this is not a designated crawl range for bingbot

The rDNS returns corp.microsoft.

Not sure what point you're trying to make iamlost, but there is no argument here. No one is saying this is bingbot.

blend27

9:06 pm on Jun 30, 2018 (gmt 0)

...did you happen to notice if its headers are identical...

not sure if they are but here we go:

ip: 13.77.169.***
remote host: 13.77.169.***
time: {ts '2018-06-28 03:**:**'}
http_content: 
method: GET
protocol: HTTP/1.1 
Cache-Control: no-cache 
connection: Keep-Alive 
accept: */* 
user-agent: Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) 
Accept-Encoding: gzip, deflate 
pragma: no-cache 
host: www.example.com 
From: bingbot(at)microsoft.com 
content-length: 0

[edited by: keyplyr at 12:25 am (utc) on Jul 1, 2018]
[edit reason] obscured IP addresses [/edit]

lucy24

10:56 pm on Jun 30, 2018 (gmt 0)

Content-Length? Now that's interesting, because everything else matches. (Mine say Connection: close for all requests, ever, without exception. But I only just read somewhere--possibly even an old thread hereabouts?--that this is sometimes changed by the host.)

:: detour for closer checking ::

No, that's what I thought. I've never seen a Content-Length header from the bingbot, where �never� = within the time that I save logged headers, which is obviously not infinite.

Edit: Doesn�t Content-Length: 0 normally mean a HEAD request? I�ve never paid much attention to it.

blend27

7:59 pm on Jul 4, 2018 (gmt 0)

Doesn�t Content-Length: 0 normally mean a HEAD request? I�ve never paid much attention to it.

Content-Length header is usually sent with GET request. With POST Requests it has some data in it, the length of a string being posted.

That is how we ban them when the POST request comes in with self referrer and no time/or none existent previous visit in between

Classic Guest-Book Spam trap if you will... ;)

keyplyr

8:01 pm on Jul 4, 2018 (gmt 0)

That is how we ban them when the POST request comes in with self referrer...

Exactly

blend27

8:04 pm on Jul 4, 2018 (gmt 0)

I just had a quick look at the logs and am seeing this BOT requesting URIs that are clearly blocked out in Robots.txt. Never gets to request the actual Robots.txt though.