You may recall that I am IP-blocking a large percentage of the IPv4 (currently 34.7%) from hitting my web server. This leaves a reduced IPv4 universe from which hosts perhaps a small and diverse set of rogue hosting players that get through, such as this:
23.161.169.62 (AS400529 Infraly, LLC)
Today's several dozen hits from that IP consisted of blindly asking for files in various /.env, /.git and /api paths and json files like config.json, firebase-adminsdk.json, google-credentials.json, secrets.json and service-account.json - none of which I have. It also crawled part of my site (html files only). I never see that sort of combination (probe for vulnerabilities and then crawl the site a little).
But here's the thing - it alternated these hits using a variety of user-agents. The complete list:
CCBot/2.0 (https://commoncrawl.org/faq/)
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Mozilla/5.0 (compatible; DeepSeekBot/1.0; +https://www.deepseek.com/bot)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Google-CloudVertexBot; +https://cloud.google.com/vertex-ai-bot)
Mozilla/5.0 (compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot)
Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Mozilla/5.0 (compatible; xAI-SearchBot/1.0; +https://x.ai)
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)
(I don't think I've seen Google-CloudVertexBot before, a topic for another thread?)
I find this list very useful - because of the presence of these two UA's:
Mozilla/5.0 (compatible; DeepSeekBot/1.0; +https://www.deepseek.com/bot)
Mozilla/5.0 (compatible; xAI-SearchBot/1.0; +https://x.ai)
I have only seen them once before - in April this year from a rogue IP (208.92.235.45 - AS399244) - another crack-pot entity (now IP-blocked). It asked for a handful of .env and .git files. So I'm counting those as fake deepseek and xAI hits.
I rarely (and I mean rarely) have ever seen a hit claiming a main-line search-bot UA that was forged. And even then it was not part of session that systematically worked through a list of search bots.
But the more important thing here for me is the DeepSeek and xAI searchbot UA's. I have never seen an actual legit hit from either of those 2 bots, so I can only wonder if those UA's above represent actual working UA's or fabricated speculation of what they might look like.
Has anyone here ever had hits from those 2 bots? Legit hits, not forged?