Forum Moderators: mack

Message Too Old, No Replies

Weird search requests by Bingbot IPs

         

squarebracket

4:27 pm on Feb 9, 2025 (gmt 0)



For the last couple of weeks, Bingbot IPs have been issuing bizarre search requests on a client's web site. The search syntax is valid, but the search terms have nothing to do with the site. Sometimes they are in Chinese. Sometimes they are gibberish.

I've reached out to the Bingbot support people at Microsoft, but so far they have no answers, and the requests continue.

I've had to block all the Bingbot CIDRs I can discover, and at least now the site isn't spending all its resources doing irrelevant and unnecessary searches.

Why would a search engine crawler even bother to try crawling search results? And why with irrelevant search terms?

I can't help wondering if someone is co-opting Bingbot to perform some kind of DDoS against the site. Before I started blocking the Bingbot CIDRs, the excessive searches were causing site failures.

not2easy

4:46 pm on Feb 9, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Is it claiming to be Bingbot? I ask because a lot of the same IPs for Bingbot are sending scraper bots, I have been blocking some of those for years. What is the UA?

squarebracket

5:43 pm on Feb 9, 2025 (gmt 0)



Here's one from a few minutes ago:
52.167.144.234 - - [09/Feb/2025:09:34:57 -0800] "GET /catalogsearch/result/?q=how+to+delete+netflix+history HTTP/2.0" 403 436 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36"

User Agent is bingbot/2.0.

52.167.144.234 is a verified Bingbot IP (https://www.bing.com/webmasters/verifybingbot).

So far I'm blocking these CIDRs in htaccess:
deny from 20.0.0.0/11
deny from 40.76.0.0/14
deny from 52.160.0.0/11
deny from 157.55.0.0/16
deny from 207.46.0.0/19

I'm starting to see occasional requests from other Microsoft (but non-Bingbot) hosts, some of which seem relevant. They may be the result of testing by Bing engineers.

Note that before these weird requests started, I had already throttled Bingbot in robots.txt. I shudder to think what would have happened if that throttling had not been in place.
User-agent: bingbot
Crawl-delay: 60

not2easy

6:04 pm on Feb 9, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I checked into that first IP and it certainly looks like their IP and UA string. I see I have that IP blocked as well, they ignored robots.txt according to my notes. It is good you had a crawl delay in place.

squarebracket

7:48 pm on Feb 9, 2025 (gmt 0)



According to Microsoft, Bingbot does follow any relevant directives in robots.txt. See [bing.com...]

not2easy

8:37 pm on Feb 9, 2025 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yes, I had read that they are supposed to read and follow but in practice some bingbot UAs ignore part of it. They get blocked which blocks the ones who might be following directives. Some ranges mentioned above were a source of their MediaBot and that one didn't bother reading it.

squarebracket

1:38 am on Feb 10, 2025 (gmt 0)



This person is seeing the same thing: [answers.microsoft.com...]

squarebracket

11:25 am on Feb 10, 2025 (gmt 0)



In testing, I'm able to reproduce these weird search requests using the Bing Webmaster Tools URL Inspection tool (https://www.bing.com/webmasters/urlinspection).

"GET /catalogsearch/result/?q=<search terms>"

I used the same test URL in the Google Search Console's URL Inspection tool, and it did NOT make a corresponding request to the site, saying "Page cannot be crawled: Blocked by robots.txt". There is definitely a rule in robots.txt that prevents well-behaved crawlers from requesting URLs like that:
User-agent: *
Disallow: /catalogsearch/

This confirms that while Bingbot may honour some robots.txt directives (eg. Crawl-delay), it does not honour all of them, and in particular seems to ignore path-based disallows.

As long as Bingbot and Bing tools can be used to perform attacks like this, I will continue to block Bingbot hosts. Hopefully Microsoft will recognize that this is a problem and do something about it. I've always suspected that Bing Webmaster Tools is just a reverse-engineered Google Webmaster Tools, so if I can demonstrate to MS that a Google function they copied is demonstrably worse than the Google function, maybe they'll pay attention.

Meanwhile, I'm left wondering if this is a directed attack on our site, or if it's just some idiotic script trying to pull useful information from multiple sites.