| This 94 message thread spans 4 pages: < < 94 ( 1 2  4 ) > > || |
|Block non-North American Traffic for Dummies Like Me|
Reducing the size of your blocking list.
| 6:48 pm on Apr 17, 2014 (gmt 0)|
First off, this subject has been discussed before but I felt that there's enough current interest in this board and on other boards here at WebmasterWorld alone, to warrant a fresh top-down discussion of the subject. We'll see if our moderators agree.
The list of CIDRs below was compiled from the Iana IPv4 Address Space Registry report [iana.org]. The list is a compact version of all Allocated non-ARIN /8 blocks (from APNIC, RIPE NCC, AFRINIC, and LACNIC). For example, 18.104.22.168/7 actually merges 22.214.171.124/8 and 126.96.36.199/8 into a single CIDR. The largest block in this list is 188.8.131.52/4 which merges the 184.108.40.206 through 220.127.116.11 address range.
Some of the CIDR's below merge blocks from different registries e.g. combining blocks from both RIPE NCC and APNIC. As such, this does not in any way represent an approach surgical enough to differentiate blocks in one RIR from blocks in another (let alone blocks representing specific countries). The goal here is to arrive at a blocking strategy that keeps people and bots from outside North America off your site.
It should also be noted that the list below is only intended as a good first step where blocking is concerned. There are many holes in the Legacy blocks that this step does not address and proxies are another whole topic of ingress. The intention here is to succinctly narrow the scope of the task with as little effort as possible.
One tangible benefit of this approach can be seen in the 18.104.22.168/5 range which blocks
22.214.171.124 to 126.96.36.199. This CIDR contains some AWS and Rackspace ranges (and probably other server farms as well). Blocking this range means you don't have to identify and separately block those server farm ranges.
So, I'm hoping that
1.This list is helpful to those looking for a starting point
2.That, if there's a mistake in the list above, that the moderators will see fit to correct the list when the mistake is identified so that the first post can reflect accurate and up-to-date information.
3.That this discussion can move forward with new ranges outside the Allocated blocks to help expand this list even further. Anyone want to block the UK Ministry of Defence (sic)? That /8 block and others are omitted here in this initial list because they are Legacy blocks.
And last for now. It is possible to further reduce the above list to a series of Regular Expressions which would be even more condensed than the list above. For those with access to a rewrite module (Apache or IIS) this list would be valuable but I'll leave up to an expert in that arena to post the list if they care to. I hope this helps someone and can save them the time I (and many others) have spent whittling down the world a bit.
Comments and corrections are most welcome!
| 2:01 am on Apr 22, 2014 (gmt 0)|
Edit: Oops, didn't see we were onto a new page. I was responding to:
|Actually, this is related to a site on a dedicated server and involves querying a fairly large relational database for the majority of page loads. The cost of robotic visits alone can be staggering in terms of server resources required to query the size of database involved (even with a variety of caching mechanisms in place). |
I regularly see 50,000 to 75,000 crawler hits a day (or more) from Google, Yahoo and Bing and that number is (or was) dwarfed by the rest of the robotic traffic on the site. It's a dynamic site so, as Lucy intimated, there is a real downside to triggering a query against a table with 7,000,000+ records in it
Whole new issue then. If the act of building a page consumes far more server resources than evaluating undesired visitors, then you can definitely afford to be granular. And then it becomes more of a "how to..." question. You want to divide your unwanted visitors into various categories. At a minimum:
-- the ones who will never be allowed to set foot in your site, ever,
-- the ones who look suspicious but might conceivably be legitimate or even desirable.
Dedicated server. Does that mean you have-- or could set up-- a firewall? The unconditional lockouts can stay outside the firewall where they never have to bother the server. The ones who are allowed inside the door are subject to further filtering. Here there's really no limit to what you can do. Even if you've got the world's kindest and most helpful 403 page, even if you redirect all the maybes to a "I'm really sorry, but..." page, even if dubious requests are subjected to a whole script of their own, it's still less work than building a complete page from a vast database.
You can track requests, for example. Did the same IP ask for two consecutive different pages with no intervening request for supporting files? Have they asked for five pages in three seconds? Do all requests come in with an auto-referer? (These are impossible to deal with in mod_rewrite-- except in a very narrow, targeted way-- but become trivial in php. Or language of your choice.)
I realize the original question was simply about identifying North America vs. the rest of the world. But with a massive database at stake, it's worth sifting more carefully.
It works at any level. Sometimes I think about simply de-indexing all my images because, heck, what's the point? Nobody ever ends up on the page, and I don't gain anything from them just looking at the picture. And then I remember the email I got from an Australian who was searching for some wildly improbable combination of furry green shape-shifting widgets, and landed on a page of mine that nobody ever visits. It was exactly what she was looking for.
Everyone gets these, in some form or other. Some of them may even send you money.
| 3:58 am on Apr 22, 2014 (gmt 0)|
|If the act of building a page consumes far more server resources than evaluating undesired visitors, then you can definitely afford to be granular. |
If the visitor turns out to be a bot and I don't serve the page then I would agree with that but, for each welcomed visitor, I've just added response time to their request which is, I think, my dilemna, i.e. making an already resource intensive request, more intensive and time consuming.
As for firewall options, I do have access to a software firewall on the server but not a hardware firewall external to the server. I could have but that's adding a hefty expense in itself. A bot banging on a software firewall can cause a load in itself though not the type of load the requests it's making would cause so it's part of the strategy for sure. Layering the security makes a lot of sense to me and as much as this thread poses some very simple and straightforward approaches, I can contend that, given the time to develop better options, I see my own strategy evolving over time. Right now I do track requests but only use that tracking information after the fact to make blocking decisions (such as identifying and then blocking a server farm IP range). I see myself heading in a more refined direction eventually and you could say that this thread (and a whole lot of other information I'm slowly absorbing in this forum) is leading me inexorably down that path.
I'm certain I have the skills to write the code necessary to do what you described. I guess I'm feeling my way through the process of getting there at this time. Ocean's sticky post on this board about identifying bots has provided a lot of insight on the subject but I still find myself fumbling for the best way to proceed. Perhaps part of my dilemma is just in learning what each layer of the security puzzle offers and how and when to best use it. It's coming together slowly but the whole picture isn't perfectly clear yet either.
Anyway, I think the most important point of this particular post should be that I truly appreciate all the feedback in this thread and don't think there's anything here that doesn't simply make me want to dig deeper.
| 11:40 pm on Apr 22, 2014 (gmt 0)|
Well, I thought I might provide a condensed version of the list at this juncture to summarize the results of this research. I'll just re-emphasize that this list covers a very large portion of the allocated IPv4 address space and using it as a blocking mechanism comes with some more-than-trivial consequences to consider (many of which have have been raised above).
Having said that, maybe this is just a filtering mechanism e.g. maybe this just helps identify traffic you want to look at further before letting it through or blocking it. The list doesn't have to be a blocking mechanism in its own right. It's just a big bucket from where whence most of your non-North American traffic is probably coming from (and please note I didn't say all because that would be a huge fallacy to base your thinking on).
Anyway, here's a condensed version of the list with Allocated and Legacy blocks merged. Note that the list does not allow for any of the exceptions identified above. According to my count, this covers 108 /8 blocks with 52 CIDR ranges (for what it's worth).
| 12:52 am on Apr 23, 2014 (gmt 0)|
deny from 1. 2. 5. 14.
deny from 25. 27. 31. 36. 37. 39.
deny from 41. 42. 43. 46. 49.
deny from 51. 58. 59.
deny from 60. 61. 62.
deny from 77. 78. 79.
deny from 80. 81. 82. 83. 84. 85. 86. 87. 88. 89.
deny from 90. 91. 92. 93. 94. 95.
deny from 101. 102. 103. 105. 106. 109.
deny from 110. 111. 112. 113. 114. 115. 116. 117. 118. 119.
deny from 120. 121. 122. 123. 124. 125. 126.
deny from 133. 141. 145.
deny from 150. 151. 153. 154. 163.
deny from 171. 175. 176. 177. 178. 179.
deny from 180. 181. 182. 183. 185. 186. 187. 188. 189.
deny from 190. 191. 193. 194. 195. 196. 197.
deny from 200. 201. 202. 203.
deny from 210. 211. 212. 213. 217. 218. 219.
deny from 220. 221. 222. 223.
| 1:56 am on Apr 23, 2014 (gmt 0)|
There's an A that I block and Don doesn't? Who knew.
Deny from 8
|deny from 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. |
deny from 90. 91. 92. 93. 94. 95.
Deny from 188.8.131.52/4
You can do the same with 112 if you're willing to include 127 in the lockout.
| 3:12 am on Apr 23, 2014 (gmt 0)|
FWIW, y'all do this just the opposite of how I do it. I use ALLOW for the US, CA and MX ranges and DENY ALL to block the rest of the world on the few servers. You really need to include CA and MX in the US ranges as they overlap at the borders and doing wide sweeping blocks without a little consideration of the borders can lose parts of NY, CA, etc. where there's fairly dense population and lot of potential visitors.
BTW, If you use this strategy from your IPTABLES instead of just .htaccess, you'll secure your whole server, not just the website. Blocking at the firewall also stops almost all spam, hack attacks, dictionary attacks, DDOS attempts, etc. from outside the US which is where most originate.
Doesn't stop the vermin from tunneling through US proxies, but it's a helluva good start! Blocking US data centers then cures most of the proxy problem.
| 4:02 am on Apr 23, 2014 (gmt 0)|
I've wondered about 8. and have dug into some of the discussions on that topic here at WebmasterWorld but haven't arrived at personal conclusion yet. Also, my thinking is still muddled where the private ranges are concerned (10/8, 192.168/16. 172.16/12) but I know I've seen IP's from the 10 range at least in the X_Forwarded_For header before. Other than an indication that a proxy is in use, I'm not sure how to view that circumstance yet or 127 for that matter. Localhost does have it's uses after all. That 80/4 range sure looks tempting at a glance though ;)
| 4:07 am on Apr 23, 2014 (gmt 0)|
|I use ALLOW for the US, CA and MX ranges |
Chuckle, all this has actually had me asking if that wouldn't be a better way to approach this anyway. Now I need to flip this on it's head and see what it actually looks like. Thanks for that. Now I'll be up all night again. ;)
| 4:10 am on Apr 23, 2014 (gmt 0)|
|There's an A that I block and Don doesn't? Who knew. |
Deny from 8
Your referring to Level3?
I have the 8 Class A denied for more than a decade, however I was just proving an accumulation of webscentirc's listings.
I've the 4 Class A denied as well.
| 4:12 am on Apr 23, 2014 (gmt 0)|
|FWIW, y'all do this just the opposite of how I do it. |
That's always been the good thing about this forum?
"There's more than one way to skin a cat".
| 4:30 am on Apr 23, 2014 (gmt 0)|
|Localhost does have its uses after all. |
Sure, but not for browsing ;) Blocking the whole 112-127 segment won't lock anyone out. In fact the same probably applies to the Private Use ranges, because don't they tend to come in through proxies?
|You really need to include CA and MX in the US ranges |
I think through much of this discussion the distinction has been between ARIN and non-ARIN. So Canada will never be an issue, and thankfully there don't seem to be vast numbers of spammers in the islands. (Hm. Whatever happened to online gaming? It seems to have pretty much disappeared from my junk mail-- and that's where the islands really shone.)
Are there really that many people along the border using a Mexican IP? I mean, it's not like tuning your radio to XER or driving into Tijuana to save money on doctor visits. Your internet connection is typically linked to something like phone or TV, and that's going to be based in your own jurisdiction.
| 4:54 am on Apr 23, 2014 (gmt 0)|
|Your internet connection is typically linked to something like phone or TV, and that's going to be based in your own jurisdiction. |
That's a nice theory but people run out of IP blocks and people on the other side of the border are more than happy to accommodate those shortcomings for a modest fee.
I only know this as I used to do ecommerce anti-fraud detection code which displayed the country code next to each order. I flagged it RED as an item needing human inspection if the country code for the IP didn't match the billing or shipping address and/or the email address country of origin didn't match as well. Back in the day I saw a significant amount of borderline cases om the US and UK to know there would be lost revenue if you just blindly blocked massive IP ranges along the dotted lines and ignored the fuzzy gray areas.
Many of the crossover points tended to be local ISPs, not the major providers, but I can't rule out major providers either as it's been too long to remember all the specifics.
Don't know the current stats on border crossing IPs, but since I've seen it in real life, I just plan for it and leave it at that.
|That's always been the good thing about this forum? |
"There's more than one way to skin a cat".
Yup, and I prefer the shorter and more efficient way of skinning a cat because those claws hurt like hell.
| 8:09 am on Apr 23, 2014 (gmt 0)|
|Would it please you more if the subject line read "Block non-UK Traffic" |
A British webmaster would be unwise to consider this as 10% of British citizen's live abroad - it would be like blocking London.
Even e-commerce sites that only deliver within the country may sell to foreign visitors. It may also be difficult to accurately identify where visitors come from.
I live in Sri Lanka. I often use a UK proxy so I look like I am in the UK. I often buy things from sites in the UK for delivery in the UK. I have sometimes bought things from US and Australian sites for deliver in the country. Sometimes these have been presents (it is easier than buying something here and posting it) and sometimes someone has taken delivery in the country of purchase and brought it here for me.
There are businesses that exist just to forward goods delivered only in the US to end buyers in other countries.
I have sometimes had problems with sites that will only take payment from people in the UK. I suspect any consumer e-commerce sites that make no sales to foreign residents at all are doing something like this.
|Many of the crossover points tended to be local ISPs, not the major providers, but I can't rule out major providers either as it's been too long to remember all the specifics. |
AOL used to use the same IP ranges globally once upon a time.
| 8:35 am on Apr 23, 2014 (gmt 0)|
|AOL used to use the same IP ranges globally once upon a time. |
AOL is less than a shadow of its former self.
Here's in "the States" and in "the widget category", nearly everybody used AOL because they allowed a connection in most US & Canadian cities.
Since the advent of mobile devices, AOL has lost a major market share.
Given the additional advent of Social Media, AOL has suffered even more.
FWIW, there are many longtime webmasters in this forum that could generate a "proxy thread" that would be comparable in numbers to the server farm thread, at least as related to denials of aforementioned.
| 8:57 am on Apr 23, 2014 (gmt 0)|
|There are businesses that exist just to forward goods delivered only in the US to end buyers in other countries. |
Since online gaming has already been mentioned in this tread, I won't feel as if I'm guilty of swaying off-topic.
A few years back the US Govt., and banks (some; NOT all) joined together in an attempt to thwart online gaming. Some credit and/or debit cards of specific banks may not be used in the US for online gaming wagers. For a while a few people found proxies effective (as well as companies that specialized in presenting themselves as something else in the middle of the transaction), however the banks caught on to these schemes and eventually the companies (off-shore and otherwise) left the US Market entirely.
| 9:01 am on Apr 23, 2014 (gmt 0)|
Can you summarize the benefits of doing so? Why would you block non-US traffic? Less traffic, less backlinks?
| 9:11 am on Apr 23, 2014 (gmt 0)|
Opening of thread. Page 1, 1st paragraph
|First off, this subject has been discussed before but I felt that there's enough current interest in this board and on other boards here at Webmaster World alone, to warrant a fresh top-down discussion of the subject. We'll see if our moderators agree. |
see comments by Samizdata (and replies to same by webcentric) beginning on Page 2 and proceeding through Page 3.
see reply from tangor on Page 3
see reply from diberry on Page 3
see reply from dstiles on Page 3
| 10:26 am on Apr 23, 2014 (gmt 0)|
@wilderness, that is a very different case: gambling offshore is illegal and there was a concerted effort by the US govt and the banks to stop it.
I cannot see what the companies I am talking about are doing that is illegal, or why anyone in the US would want to stop it. In fact given that they have to have a US address, and they advertise their services very widely, and none appears to have been shut down, the service is probably perfectly legal.
| 10:58 am on Apr 23, 2014 (gmt 0)|
I don't currently block by country, because it's more trouble than it's worth to me, but I don't see anything wrong with it. The likelihood that someone in Europe or Africa is going to be interested in where and when there will be a community event in Springfield, Ohio is pretty remote. It could happen, but ... after fifteen years, it almost never has.
| 12:26 pm on Apr 23, 2014 (gmt 0)|
(I tried your solution, Bill, by just whitelisting the US and Canada, but found I couldn't get to my sites from various spots around town, and got emails from regular users that they could no longer access the sites. That was too problematic for me)
| 2:06 pm on Apr 23, 2014 (gmt 0)|
I will say that I tried a whitelisting approach few months ago using IP2Location. Just identified the country codes I wanted to let through and blocked everything else. I almost immediately noticed that the preview in the Adsense scorecard (the image that shows when you run a Page Speed test) was being blocked and, low and behold, the test was being run running against my error page (very fast I might add). It was a weird enough effect to cause me to abandon that strategy wondering what else what getting blocked. Of course, there's one possible result of trusting data you didn't personally compile.
Added: Where whitelisting is concerned, it makes me think that a list of ISP ranges would be valuable per Bill's cross-border observations. In other words, you could let all of Mexico in (and then start blocking Mexican server farms) or you could open up holes for certain ISP's that have ranges in Mexico. My cat is cringing but agrees on the "more than one way" concept. Think he mentioned dogs in his analogy though.
| 2:44 pm on Apr 23, 2014 (gmt 0)|
Blocking "all non-North American traffic" seems drastic - and I run a city-based website. I use Wizcraft's fabulous lists - [wizcrafts.net...] - to block China, Russia, Nigeria, South America, and other open servers. I then have my own log tracking software that spots the dedicated server hits - I then track that org's CIDR list down, load it up into Excel with Power Query, and viola - I have my own IPTABLES to add to the blocks (with a little Excel fu).
Seriously check out Wizcrafts - he does amazing work updating his lists. It will save you a lot of time with this, I suspect.
| 4:57 pm on Apr 23, 2014 (gmt 0)|
Thanks for the mention, BigToga. I not only apply my own ever growing blocklists to my .htaccess file, I also use Cloudflare's IP and country blocking feature as an external firewall for my shared hosting account.
While a large .htaccess full of blocklists and rewrite blockades may cause a slight slowdown in page load times, the Cloudflare caching of static pages and embedded scripts nullifies any such losses.
Because I also deal in anti-spam work, I only block certain problematic countries, ISPs and data centers. I have routine dealings with folks in Australia and New Zealand, as well as G.B. and sometimes South Africa.
| 10:47 pm on Apr 23, 2014 (gmt 0)|
|Can you summarize the benefits of doing so? Why would you block non-US traffic? Less traffic, less backlinks? |
Block spam, hackers, scrapers, fraud orders, etc.
If you don't sell abroad or ship abroad, there's really no reason to allow your site to be accessed abroad unless you're into sadomasochistic webmastering.
| 11:00 pm on Apr 23, 2014 (gmt 0)|
|Because I also deal in anti-spam work, I only block certain problematic countries, ISPs and data centers. I have routine dealings with folks in Australia and New Zealand, as well as G.B. and sometimes South Africa. |
Countries generally considered as add-on to North America, or more specifically ENGLISH SPEAKING countries.
NO disrespect to non-English speaking countries, merely an observation that the content is not in their native language, which wonders why the hits upon hits, etc.
| 11:16 pm on Apr 23, 2014 (gmt 0)|
If you don't sell abroad or ship abroad, there's really no reason to allow your site to be accessed abroad unless you're into sadomasochistic webmastering.
While I only ship a couple dozen orders outside of US & Canada each year, I do sell advertising and www accessibility means human traffic, lots of it. This alone is a huge selling point to advertising.
| 12:34 am on Apr 24, 2014 (gmt 0)|
|the content is not in their native language |
Keep in mind that outside the US, educated adults can be assumed to read one or more additional languages. If you have a choice between an excellent site in a foreign language, or a not-so-good site in your own language, you're not stuck with the one in your mother tongue.
In a different venue I know German speakers who have intentionally set their computers' interface language to English, because that is the only way to keep websites from throwing horrible auto-translated content at them.
| 7:12 pm on Apr 24, 2014 (gmt 0)|
> Countries generally considered as add-on to North America, or more specifically ENGLISH SPEAKING countries.
Ingratiating politicians aside, I'm fairly sure a lot of countries, including UK, would resent that! :)
I'm with Lucy on this: English-speakers are lazy and by and large do not speak a second language to any degree (shool french does not really count!). Non-native-English-speakers, on the other hand, often do speak at least English - some Scandinavian countries better than wot us English natives does.
| 9:37 am on Apr 25, 2014 (gmt 0)|
|While I only ship a couple dozen orders outside of US & Canada each year, I do sell advertising and www accessibility means human traffic, lots of it. This alone is a huge selling point to advertising. |
That would depend on the advertising. If the advertisers also only sell in the same limited areas then the additional traffic is worthless and invalid.
Obviously these issues have to be decided on a site by site basis as not one size fits all.
FYI, others have mentioned language issues which are more easily addressed using the accept-language header sent by the browser.
Remember, the topic is SIMPLIFIED traffic blocking, not languages or advertising, etc.. Let's try to keep it on the topic of blocking areas using /7 and /8 CIDRs.
| 11:01 am on Apr 25, 2014 (gmt 0)|
I'll just quickly throw in that having a high percentage of North American or, more specifically, US traffic is attractive to many advertisers and advertising networks. Some even specify this when defining site quality in their documentation. If I recall correctly, Media.net is one of those networks.
Getting back on topic. This technique is working pretty well for me. As Bill mentioned, dealing with ARIN-based server farms is a significant part of the strategy but tackling even just a few of the major ones has had a dramatic impact. There's certainly a trade-off of a sort but so far, that trade-off has been more than equitable.
Veering OT again: BTW, there are plenty of people right here in the US who can't read or even speak English. I have to use gestures to get anything across to a couple of my neighbors and I can muddle my way through about 3 1/2 languages when necessary. This topic really isn't about language or culture. It's about geography.
| 6:16 am on Apr 26, 2014 (gmt 0)|
Two questions for ecommerce sites who do not deliver abroad:
1) Does you order process allow people to enter a foreign credit card billing address? There is at least one British site I would have bought a few hundreds of pounds worth of stuff from if they did.
2) If the answer to 1) is yes, do you know how many people on with foreign billing addresses bought from you? If you use a payment processor do you even have the data?
3) If the answer to 1) is no. Why not?
Also, does anyone have any numbers on the rate of false positives.
As far as language is important, this makes interesting reading:
Although I can see some of the numbers are badly off the overall picture is correct.
| This 94 message thread spans 4 pages: < < 94 ( 1 2  4 ) > > |