Forum Moderators: open

Message Too Old, No Replies

Lets talk about VPN Ranges

or what is it that determins a real visitor using VPN vs.Bots

         

blend27

1:43 am on Jun 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I had some of my network rewired recently using 'nonnVPN' as an excitement of being encrypted.

Tried a few server locations only to find that I was tunneled thru m247, datapipe and such which I have been actively blocking on my sites end for sometime now.

And now this: what and how from "Shimmy Shimmy Ya" and 'would Ya even touch my skill' perspective does one allow users using VPN w/o blocking them outright.

CAPTCHA sort of thing, local script only NONE GOOG, without Cloudflare and based on what?

Headers works wanders, what else?


All ears!

jmccormac

2:08 am on Jun 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some VPNs are identifiable but other have taken to using smaller allocations in data centres.Log tracking will flag the problem IPs (scrapers) quickly.

Regards...jmcc

iamlost

5:23 am on Jun 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My first thought was: well, how are you currently detecting non-VPN bot traffic?

VPNs have become surprisingly mainstream in the last couple of years what with TV and SM ads touting privacy and bypassing entertainment restrictions. Sufficient (depending on requirements) to be simply considered part of general traffic. And all traffic generally needs be considered bot contaminated to some degree.

Bots, like fighter jets, have their generational capabilities. The current fourth generation bots have advanced human-like behaviours often having been trained on hijacked aka copied real human swipes and touches...

Just as continuing investment in bot development has created stealth attackers so too has steady investment in bot defences created real time pattern/behaviour matching by various methodologies.

Unfortunately, except for a few weirdos (I resemble that remark) the R&D time investment and ever steepening learning curve has pretty much made this field increasingly impractical for sole webdevs. Just like fourth/fifth generation fighter jets not commonly being built in one’s garage.

VPN is, as bots, an escalating development, increasingly hard to detect and increasing used by humans of good intent. A pita as VPN itself is no longer, in and of itself, a bot identifier.


Sorry. :(

martinibuster

6:23 am on Jun 7, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



There are low hanging fruit in the user agents of bots.

lucy24

4:19 pm on Jun 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yup. I've observed in the past that, when making access-control rules, think twice about any rule that would deny access to yourself. I first noticed this at a time when I was blocking any known proxy IP ... and then found out that one of those proxies was used by a local governmental entity that had more-than-legitimate reason to visit one site.

To this day, a fair number of robots don't even send a User-Agent--ka-ching! instant death, tralala!--and a good many more don't bother with the Accept header.

It is of course easier to identify robots retroactively: at the time they request /page.html, you can't yet know whether they will go on to request all its supporting files. But happily robotic behavior doesn't (yet) change daily, so you can look at those after-the-fact identified robots and find shared features. Only rarely do you have to fall back on the IP (looking at you, {well-known Continental host} and {other well-known Continental host}).

SumGuy

12:40 pm on Jun 10, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



> My first thought was: well, how are you currently detecting non-VPN bot traffic?

They self-identify themselves when they ask for "login.php" or other such garbage. When I see that, I check the IP on bgp.he and call up the list of IP subnets belonging to that AS and make a judgement that I would not likely ever get an organic hit from those subnets and I grab the list from the screen output and throw them all into my router's blocking list. Sometimes I see these hits from residential / consumer IP space (?) and in that case I will just block the /24 in question.

There are about 55k CIDR's in my router's HTTP/HTTPS IP drop list. It doesn't seem to impact the performance of the router at all! (Ubiquity ER3).