Server farms in the age of VPNs - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Server farms in the age of VPNs

How to consider server farms in the age of consumer VPNs

JamesSC

12:54 pm on Oct 11, 2018 (gmt 0)

Top Contributors Of The Month

Thanks again to everyone who helped me with my first inquiry.

There are many references across Webmasterworld to traffic coming from server farms, where the implication has been that such traffic is generally undesirable.

However, particularly since April, 2017 when U.S. broadband privacy rules were overturned, allowing ISPs to collect and market user data, and even before when Consumer Reports of all sources began explicitly recommending its readers practice anonymity online, more and more ordinary people have been turning to routinely using VPNs.

As a consequence, available addresses are in ever shorter supply, and server farms are consequently filling that void. It's also not uncommon to find one or more addresses already blocked by Google or a site's CDN provider.

So: since it is now a fact now that more and more ordinary, legitimate humans are arriving at sites from addresses that previously might have been written off as hacker/spammer neighborhoods, if it has, how has your philosophy and, more importantly, your specific treatment of server farm ranges evolved?

Thanks,

James

justpassing

1:10 pm on Oct 11, 2018 (gmt 0)

Top Contributors Of The Month

I created my own CAPTCHA-like form that I use when I detect an IP which belongs to my "suspect" list. So real humans can pass and continue. Then I keep an eye on what happens to eventually refine my list.

Steven29

2:53 pm on Oct 11, 2018 (gmt 0)

I haven't seen a legitimate user using a cloud or hosting ip address to this day. But every single day there are thousands and thousands of bad requests.

I would say the day the legitimate requests surpass the bad requests I may look into a way to filter these better.

I think cloud servers are basically today's proxies.

lucy24

5:27 pm on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

how has your philosophy and, more importantly, your specific treatment of server farm ranges evolved?

Several years ago I dumped my old, IP-based access controls and changed over to header-based rules. Visitors are assessed based on who they are and what they do, not where they live. Yes, some robots get through--but no more than before overall, and with much less time and trouble on my part.

justpassing

5:50 pm on Oct 11, 2018 (gmt 0)

Top Contributors Of The Month

header-based rules.

Can you explain ? Do you check the headers sent during a request and depending of fields set or missing, you let them in or not ? Is that it?

lucy24

6:15 pm on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Is that it?

Yup, that's basically it. I tend not to go into detail, on the--admittedly remote--chance that botrunners are reading these forums and taking notes. (Based on the observed intelligence of the average robot, I tend to doubt it.)

justpassing

7:44 pm on Oct 11, 2018 (gmt 0)

Top Contributors Of The Month

That's fine, thank you.

keyplyr

9:01 pm on Oct 11, 2018 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Hi JamesSC,

Nothing has changed as far as site security.

Site security should be a comprehensive methodology of filtering traffic through several approaches, as outlined here: Blocking Methods [webmasterworld.com] and HTTPS Security Headers [webmasterworld.com]

@justpassing - header check methods are linked from the Blocking Methods.

justpassing

8:16 am on Oct 12, 2018 (gmt 0)

Top Contributors Of The Month

@justpassing - header check methods are linked from the Blocking Methods.

Yes thank you. It's roughly what I am doing too, but still need to be strict about server farms just because of one single scraper archive.is/today/onion (they keep changing their name), which has all the correct headers.

keyplyr

8:28 am on Oct 12, 2018 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

A site with light traffic (<10k daily page loads) can probably get by relatively OK only using Header based security, however medium to heavy traffic sites really need a comprehensive approach of all the blocking methods. There's just too much going on.

As far as visitors using VPNs hosted at server farms, the same approach is needed. I block all known server farm ranges (over 6k) with prejudice for beneficial bots, apps, schools, ISPs, *some* VPNs, and a few other variables.

However, those agents allowed through the IP filters must also validate with the other conditions.

All this takes daily diligent attention. I watch server logs and constantly tweak my rules.