the robots ye will always have with you - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

the robots ye will always have with you

lucy24

5:18 pm on Nov 7, 2025 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Over the last several months--maybe longer, because I’m slow on the uptake--I’ve been plagued by several patterns of robotic behavior.

IP: all over the map, mostly human broadband ranges
UA: fully plausible recent browser
Referer: either blank or something plausible like a search engine, no sign of referer spam
headers: fully humanoid, nothing missing, nothing fully suspicious

Pattern #1: Requests for the same page--no supporting files--10-20 or so clustered within a few seconds. All different IPs and UAs within each cluster. The frequency isn't high enough to make me suspect a DDoS attack from made-up IPs, especially when they never seem to come through as 429 (“too many requests”).

Pattern #2: Request for some random page, immediately followed by all supporting files from a different IP, most often 34.34.etcetera (blocked, of course).

Pattern #3: Request for one page and its supporting files (css and js) but not images.

Rarely, the IP turns out to be a server farm I hadn't previously known about (why, hello there, OVH! and you too Hetzner, didn’t know you lived there!) which can be happily blocked in perpetuity. But the great majority are human broadband IPs. Currently I deal with the latter by blocking the /24, because what else can I do. On any given day they are all different, but some are still in use when I do a three-month recheck. (In particular, when I look over each day’s clusters, a gratifying number are blocked, meaning that the IP has been here before.)

All these human IPs leave only two explanations. One, the IP itself might have nothing to do with the request--but, as noted above, I don't suspect DDoS activity. Two, there are a heck of a lot of people out there clicking on malware links.

Sigh.

not2easy

5:39 pm on Nov 7, 2025 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

The 34.4.5.0 - 34.63.255.255 range is Google's Cloud servers and I first met them scraping in Nov. 2021 though they probably pre-date that.

I'm also seeing more bot-like activity via residential ISPs. Mostly one-off hits. :(

SumGuy

2:52 am on Nov 8, 2025 (gmt 0)

Top Contributors Of The Month

Here's an example of what I see a lot as I filter requests based on user-agent. This sequence is from last week. Four file requests (all PDF files) from 4 different IP's. Two requests came within 5 seconds of each other, and the other two also within 5 seconds. About 20 minutes between all 4. One thing they had in common - all using this UA:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36

They triggered my "Chrome/136" flag, so they got my "I think you're a bot" page. The details of these 4 requests:

178.51.81.144 DATAIMPULSE_PROXY / NEXUS_PROXY / IPIDEA_PROXY
164.163.75.219 NETNUT_PROXY / KOOKEEY_PROXY / PLAINPROXIES_PROXY
195.138.118.99 PROXYRACK_PROXY
86.179.43.158 KOOKEEY_PROXY / DATAIMPULSE_PROXY / OPEN_ROUTABLE_PROXY

The proxy networks identification shown above came from Spur. So these were not false-positive bot ID's. They weren't getting the file they were looking for, so they kept trying.

For 2 of the above IP's, I've added their entire AS IP inventory to my IP blocking list. The other 2 (Orange Belgium and BT Central Plus) I have not.

The point is - I have several dozen user-agent rules now, and they're functing very well at blocking requests from these proxy's, many of which come from major US, Canadian, UK, Europe, and Australian residential ISP's.

Next time you see a suspicious hit, throw the IP into spur: spur . us / context / 1.2.3.4

tangor

4:34 am on Nov 8, 2025 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Weren't proxies designed to provide privacy/anonymity for users?

That said, if the bots ARE using proxies it looks like .htaccess is going to gain a little weight.

SumGuy

2:12 am on Nov 9, 2025 (gmt 0)

Top Contributors Of The Month

People install these "privacy" apps which puts their internet connection up for use by all sorts of bots and scrapers.

The residential IP owners, when they surf the web, are most likely going to have their normal / natural user-agents come through, but the bots are using their own perverted user-agents which is how I'm blocking them. I don't IP-block residential IP space (in G7 or G20 countries at least). These are probably the same bots that would normally use Digital Ocean or Hetzner but are increasingly using proxified residential IP's.

lucy24

5:50 pm on Nov 9, 2025 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

if the bots ARE using proxies it looks like .htaccess is going to gain a little weight

The eternal dilemma: Would your access controls result in blocking yourself?

My most recent batch of robot patterns either have random UAs--quite possibly the actual UA of an infected human computer--or brand-new browsers that can't be blocked because humans are legitimately using them. At any given time I do have 10-20 UAs labeled botnet_agent (some of them truly improbable, like Chrome/84), which yields a few thousand lockouts every month.