homepage Welcome to WebmasterWorld Guest from 54.163.139.36
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 94 message thread spans 4 pages: 94 ( [1] 2 3 4 > >     
Block non-North American Traffic for Dummies Like Me
Reducing the size of your blocking list.
webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 6:48 pm on Apr 17, 2014 (gmt 0)

First off, this subject has been discussed before but I felt that there's enough current interest in this board and on other boards here at WebmasterWorld alone, to warrant a fresh top-down discussion of the subject. We'll see if our moderators agree.

The list of CIDRs below was compiled from the Iana IPv4 Address Space Registry report [iana.org]. The list is a compact version of all Allocated non-ARIN /8 blocks (from APNIC, RIPE NCC, AFRINIC, and LACNIC). For example, 58.0.0.0/7 actually merges 58.0.0.0/8 and 59.0.0.0/8 into a single CIDR. The largest block in this list is 80.0.0.0/4 which merges the 80.0.0.0 through 95.255.255.255 address range.

Some of the CIDR's below merge blocks from different registries e.g. combining blocks from both RIPE NCC and APNIC. As such, this does not in any way represent an approach surgical enough to differentiate blocks in one RIR from blocks in another (let alone blocks representing specific countries). The goal here is to arrive at a blocking strategy that keeps people and bots from outside North America off your site.

It should also be noted that the list below is only intended as a good first step where blocking is concerned. There are many holes in the Legacy blocks that this step does not address and proxies are another whole topic of ingress. The intention here is to succinctly narrow the scope of the task with as little effort as possible.

One tangible benefit of this approach can be seen in the 176.0.0.0/5 range which blocks
176.0.0.0 to 183.255.255.255. This CIDR contains some AWS and Rackspace ranges (and probably other server farms as well). Blocking this range means you don't have to identify and separately block those server farm ranges.

1.0.0.0/8
2.0.0.0/8
5.0.0.0/8
14.0.0.0/8
27.0.0.0/8
31.0.0.0/8
36.0.0.0/7
39.0.0.0/8
41.0.0.0/8
42.0.0.0/8
46.0.0.0/8
49.0.0.0/8
58.0.0.0/7
60.0.0.0/7
62.0.0.0/8
77.0.0.0/8
78.0.0.0/7
80.0.0.0/4
101.0.0.0/8
102.0.0.0/7
105.0.0.0/8
106.0.0.0/8
109.0.0.0/8
110.0.0.0/7
112.0.0.0/5
120.0.0.0/6
124.0.0.0/7
126.0.0.0/8
175.0.0.0/8
176.0.0.0/5
185.0.0.0/8
186.0.0.0/7
189.0.0.0/8
190.0.0.0/8
193.0.0.0/8
194.0.0.0/8
195.0.0.0/8
197.0.0.0/8
200.0.0.0/7
202.0.0.0/7
210.0.0.0/7
212.0.0.0/7
217.0.0.0/8
218.0.0.0/7
220.0.0.0/7
222.0.0.0/7

So, I'm hoping that

1.This list is helpful to those looking for a starting point
2.That, if there's a mistake in the list above, that the moderators will see fit to correct the list when the mistake is identified so that the first post can reflect accurate and up-to-date information.
3.That this discussion can move forward with new ranges outside the Allocated blocks to help expand this list even further. Anyone want to block the UK Ministry of Defence (sic)? That /8 block and others are omitted here in this initial list because they are Legacy blocks.

And last for now. It is possible to further reduce the above list to a series of Regular Expressions which would be even more condensed than the list above. For those with access to a rewrite module (Apache or IIS) this list would be valuable but I'll leave up to an expert in that arena to post the list if they care to. I hope this helps someone and can save them the time I (and many others) have spent whittling down the world a bit.

Comments and corrections are most welcome!

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 11:44 pm on Apr 17, 2014 (gmt 0)

31.0.0.0/8
36.0.0.0/7
39.0.0.0/8

31. 36. 39.
OR
3[169]

41.0.0.0/8
42.0.0.0/8
46.0.0.0/8
49.0.0.0/8

41. 42. 46. 49.
OR
4[1269]

same methods for remainder of Class A's.

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4663915 posted 1:22 am on Apr 18, 2014 (gmt 0)

The goal here is to arrive at a blocking strategy that keeps people and bots from outside North America off your site.

Webmasters are, of course, entitled to run their sites as they see fit.

But I would not want them to take such drastic action without first reading this:

The American diaspora or Overseas Americans refers to the population of United States citizens who relocate, temporarily or permanently, to foreign countries. There are no reliable figures on how many Americans live abroad, but a State Department estimate suggests that the number may be between 3 million and 6 million.

Source: Wikipedia

...

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 2:20 am on Apr 18, 2014 (gmt 0)

6 million are 1.94 % of the population, per 2010 Census.

All one needs to do is examine the regional world visitor stats that most shared hosts provide for their sites and make a comparison to regions of the world (or even the US)with similar or lower percentiles.

Then determine if a 2% loss from any region (Including any geographical regions of the US) will endanger any success of their own widgets.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4663915 posted 2:33 am on Apr 18, 2014 (gmt 0)

You forgot 25 (UK ministry of defence) ;)

And, ouch, did you really forget 37 or is there some arcane reason for not listing it?

There are more, but I stopped looking. (My own lists are color-coded, so it's embarrassingly easy to check.)

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 3:10 am on Apr 18, 2014 (gmt 0)

And, ouch, did you really forget 37 or is there some arcane reason for not listing it?


See, 36.0.0.0/7

You forgot 25 (UK ministry of defence)


Not forgotten, just that the list above represents "Allocated" blocks as defined by IANA. The next logical step here would be to dive into the "Legacy" blocks and see what other low-hanging fruit there is.

As for why someone would block traffic from outside North America, a couple of reasons might be

1. We don't ship our widgets outside the US and all we do is sell widgets.

2. Our content is only pertinent to you if you can come down and join us at the meeting tonight. No sense catching a slow train from China cause you won't make it in time.

Blocking any part of the world involves a trade-off and as the veterans here have said on many occasions, you really need to evaluate actions like this based on your individual website goals and strategies. It can cost a lot of money to host traffic from around the globe when 99% of your income is derived from US traffic. In a situation like that, the decision may be an easy one. In other circumstances, not so much.

ADDED: The list in the original post is 46 lines (if I counted right) but covers 93 individual /8s. Getting the most bang for your buck was the point. As wilderness demonstrates, Regex can condense the condensed list even more dramatically.

Yes, the list on my machine is color-coded too. ;)

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 3:44 am on Apr 18, 2014 (gmt 0)

@wilderness -- Shouldn't the RegEx for the following

31.0.0.0/8
36.0.0.0/7
39.0.0.0/8

actually be
31. 36. 37. 39.
OR
3[1679]

?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 3:48 am on Apr 18, 2014 (gmt 0)

FWIW (I'm sure there are glowing discrepancies in here, however and from my own perspective, I'm not interested in exceptions.

Deny from 1. 2. 5.
Deny from 10.
Deny from 121.
Deny from 134.
Deny from 139.
Deny from 145.
Deny from 153.
Deny from 171.
Deny from 185. 186. 187. 188. 189
Deny from 20.

#small exceptions for these three A's
RewriteCond %{REMOTE_ADDR} ^109\.
RewriteCond %{REMOTE_ADDR} ^137\.
RewriteCond %{REMOTE_ADDR} ^115\.

RewriteCond %{REMOTE_ADDR} ^11[0-36-9]\. [OR]

#Small exceptions for Class A
RewriteCond %{REMOTE_ADDR} ^114\.
RewriteCond %{REMOTE_ADDR} ^217\.
RewriteCond %{REMOTE_ADDR} ^31\.
RewriteCond %{REMOTE_ADDR} ^38\.
RewriteCond %{REMOTE_ADDR} ^62\.
RewriteCond %{REMOTE_ADDR} ^80\.
RewriteCond %{REMOTE_ADDR} ^82\.
RewriteCond %{REMOTE_ADDR} ^84\.
RewriteCond %{REMOTE_ADDR} ^93\.



#Full Class A's
RewriteCond %{REMOTE_ADDR} ^12[0-6]\. [OR]
RewriteCond %{REMOTE_ADDR} ^14\. [OR]
RewriteCond %{REMOTE_ADDR} ^141\. [OR]
RewriteCond %{REMOTE_ADDR} ^150\. [OR]
RewriteCond %{REMOTE_ADDR} ^17[5-9]\. [OR]
RewriteCond %{REMOTE_ADDR} ^19[01]\. [OR]
RewriteCond %{REMOTE_ADDR} ^193\. [OR]
RewriteCond %{REMOTE_ADDR} ^194\. [OR]
RewriteCond %{REMOTE_ADDR} ^195\. [OR]
RewriteCond %{REMOTE_ADDR} ^196\. [OR]
RewriteCond %{REMOTE_ADDR} ^20[01]\. [OR]
RewriteCond %{REMOTE_ADDR} ^21[89]\. [OR]
RewriteCond %{REMOTE_ADDR} ^22[0-3]\. [OR]
RewriteCond %{REMOTE_ADDR} ^27\. [OR]
RewriteCond %{REMOTE_ADDR} ^3[679]\. [OR]
RewriteCond %{REMOTE_ADDR} ^4\. [OR]
RewriteCond %{REMOTE_ADDR} ^4[1369]\. [OR]
RewriteCond %{REMOTE_ADDR} ^5[789]\. [OR]
RewriteCond %{REMOTE_ADDR} ^60\. [OR]
RewriteCond %{REMOTE_ADDR} ^8\. [OR]
RewriteCond %{REMOTE_ADDR} ^8[135689]\. [OR]
RewriteCond %{REMOTE_ADDR} ^9[01245]\. [OR]


#Aussie-Kiwi; Basically the following Class A's are denied with a sort of whitelisting (omitted the allowed ranges).

RewriteCond %{REMOTE_ADDR} ^(144|20[23]|21[01]|61)\.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4663915 posted 4:54 am on Apr 18, 2014 (gmt 0)

31 36 37 39
OR
3[1679]

The first form is CIDR ranges. No . needed, because it's implied in the number. Hm, could swear I was explaining this to someone else only the other day. 31 means only 31, not 131 or 231 or -- if it existed -- 311 or, for that matter, 31AF.

The second form is Regular Expressions. Use the form that's appropriate to the environment. "Deny from..." lists use CIDR notation, not regular expressions.

^3[67]\.
=
36.0.0.0/7

I'm not 100% sure you need the trailing zeros (similarly things like 123.34.0.0/15 vs. 123.34/15) but it definitely can do no harm; at worst it would be superfluous bytes.

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 5:38 am on Apr 18, 2014 (gmt 0)

@lucy24 -- Thanks for clarifying the two forms. My question to wilderness was really addressing the fact that the 37 block was missing from his initial example, which is easy to miss if you're dyslectic like me or if you're thinking /8 and not /7. For anyone who doesn't understand this, the following example should clarify.

36/8 = 36.0.0.0 - 36.255.255.255 (a single /8 block)

36/7 = 36.0.0.0 - 37.255.255.255 (two consecutive /8 blocks)

A bit of trivia: Notice that the /7 blocks in the original post all begin on even numbers e.g. 36 but, just to confuse things if you like, 36/7 can be written as 37/7. They both cover the same range. It's just easier in my mind to start on the even number and know that that /7 includes the /8 I started on and the next one as well.

And I see wilderness has jumped in with some legacy blocks and conditions but, keeping in mind that the title of this thread says "...for dummies like me," I'm going to try and demonstrate the thought process I've taken to arrive at that destination and why each piece of the solution is included in the final product. So pardon some redundancy as this progresses.

I'll just also quickly state that while a great many webmasters run websites on Apache, not all do and the way these solutions can be expressed will vary from server to server and from hosting account to hosting account. Some people may need to enter this information into an IIS module (such as the IP and Domain Name Restrictions module) or into a web.config file and may need to use net masks rather than CIDR notation or Regular Expressions. Some will have access to rewrite modules and others won't.

[edited by: webcentric at 5:56 am (utc) on Apr 18, 2014]

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 5:40 am on Apr 18, 2014 (gmt 0)

I'm not 100% sure you need the trailing zero

Depends how the server is set up. I've seen both set-ups at hosting companies. At my current home, if the A.B.C.D are not all there you'll get a 500 Server Error.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4663915 posted 7:55 am on Apr 18, 2014 (gmt 0)

36/7 can be written as 37/7

But don't get in the habit of doing this, because every instinct will want to interpret it as 37-38 when of course it's really 36-37. Same goes for any number:

1.2.192.0/18 = 1.2.192.0-1.2.255.255

so you could technically plug in the exact IP of your most recent offending visitor and say

1.2.234.67/18

meaning exactly the same thing... but that way lies madness ;)

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 8:10 am on Apr 18, 2014 (gmt 0)

So, why block the initial list. The listed, ALLOCATED blocks are by definition distributed within each listed registry's region. I can't tell you if that's 100% true but it's the assumption I'm going with. Here's IANA's definition of "ALLOCATED"

ALLOCATED: delegated entirely to specific RIR as indicated.


Moving on to "LEGACY." Again a definition from IANA.

LEGACY: allocated by the central Internet Registry (IR) prior to the Regional Internet Registries (RIRs). This address space is now administered by individual RIRs as noted, including maintenance of WHOIS Directory and reverse DNS records. Assignments from these blocks are distributed globally on a regional basis.


OK, clear as mud, right?

So, in looking at the block list wilderness posted, I'm seeing the following Legacy ranges listed. There are a few others not listed here but they're not in that list so I guess I'll look for consensus on just the ones that are first.

43/8 APNIC Legacy
141/8 RIPE Legacy
145/8 RIPE Legacy
150/8 APNIC Legacy
153/8 APNIC Legacy
171/8 APNIC Legacy
188/8 RIPE Legacy
191/8 LACNIC Legacy
196/8 AFRINIC Legacy

My next question as a dummy who's trying to get to the bottom of this is...does anyone really know how pure these ranges are? In other words, is there any consensus where blocking these ranges is concerned when the goal is to block non North-American traffic? Anyone have any words of warning or know of any holes in these ranges?

bhukkel



 
Msg#: 4663915 posted 9:42 am on Apr 18, 2014 (gmt 0)

does anyone really know how pure these ranges are?


It is very detailed but you can download the RIR database of every registry and see which ranges are allocated to each country. For example the 150 range:

number of ranges;number of ips;country
7;458745;AU
2;131070;BE
5;327675;BR
2;131070;CH
7;524281;CN
4;262140;ES
1;65535;FR
2;131070;GB
2;65534;GR
4;262140;IT
82;6553518;JP
3;196605;KR
1;65535;NO
1;65535;NZ
1;65535;PL
3;196605;SE
1;131071;TW
98;6553502;US
3;327677;VE

For the US i see many university blocks.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 11:08 am on Apr 18, 2014 (gmt 0)

And I see wilderness has jumped in with some legacy blocks


It's generally accepted that I'm more of a geek than I actually am (lucy could easily verify this, and has), however I didn't have a clue what a "legacy block" was and had to do a google.

Legacy=ARIN

For the US i see many university blocks.


Over the years I've had many problems on my sites with Universities, and this has been discussed many times (although not recently).
Many Universities work under the umbrella's of 3rd party grants, which are generally not used by staff or student body to expand their education, as opposed to working on focused commercial projects.
All-in-all the specifics I'm sure are related to my widget references not being available at any other online sources.

jmccormac

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 11:14 am on Apr 18, 2014 (gmt 0)

The problem with such an approach is that it is crude. It does not have the granularity to deal with reallocated ranges and subnets. The US has a lot of non-US clients in data centres and on the mega hosters who have subnets allocated to clients in countries outside of the US. The major cloud hosters also use a lot of US IP ranges.

One of the surveys that I run periodically is an IP mapping survey and it does discriminate between hosters/datacentres and ISPs. The web, at an IP level, is a lot more complicated than people realise.

Regards...jmcc

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 11:36 am on Apr 18, 2014 (gmt 0)

The problem with such an approach is that it is crude. It does not have the granularity to deal with reallocated ranges and subnets.


That is certainly one of the perils in using CIDR, however with mod_rewrite, no such subnet exceptions exist.

EX:
RewriteCond %{REMOTE_ADDR} ^84\.([0-9]|[1-9][0-9]|1[0-9][0-9]|20[0-9]|21[02-9]|2[2-5][0-9])\. [OR]
RewriteCond %{REMOTE_ADDR} ^84\.211\.([0-9]|[1-35-9][0-9]|4[0-8]|1[0124-9][0-9]|13[0-8]|2[0-5][0-9])\. [OR]
RewriteCond %{REMOTE_ADDR} ^84\.211\.139\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[01345][0-9]|22[013])$ [OR]
RewriteCond %{REMOTE_ADDR} ^84\.211\.49\.([0-9]|[13-9][0-9]|2[0-35-9]|1[0-9][0-9]|2[0-5][0-9])$ [OR]

[edited by: wilderness at 11:58 am (utc) on Apr 18, 2014]

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 11:44 am on Apr 18, 2014 (gmt 0)

BTW and for the record!
I've gone through a few complete Class A's and their entire subnets to survey and/or separate valid ranges.
Generally speaking the time spent was NOT worth the percentile of "saved innocents" and/or their traffic.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 11:57 am on Apr 18, 2014 (gmt 0)

The first form is CIDR ranges. No . needed, because it's implied in the number.


lucy,
considering that on most days, I'm lucky to recall my own name!

I'm not about to change methods that have been in place and remain functional for some 15-years (thereby creating havoc and inconsistency within my degrading recollection), simply for the sake of a useless period and or a ka-zillionth less time saved on my hosts CPU.

I do however thank you for the clarification ;)

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 2:14 pm on Apr 18, 2014 (gmt 0)

It is very detailed but you can download the RIR database of every registry and see which ranges are allocated to each country.


Just spent 25 minutes looking for the LACNIC or AFRINIC versions of this data with no luck. Any clues on where to download these? I love extra data in my Alphabet soup. ;)

Legacy=ARIN


I think you'll find that Legacy means that the block contains assignments that pre-date ARIN (think back to the days when Network Solutions was the sole government contractor responsible for IP allocation). Many of the Legacy blocks have since been transferred to other Registries but still contain residual US ranges (as bhukkel demonstrated above) and that is the reason they are not on the initial list.

And yes, even the "ALLOCATED" ARIN blocks contain things like proxies that will allow bad people from outside the country in under your security blanket.

One could stop with the initial list if granularity is a concern or continue to expand on it. Perhaps after blocking the above, the strategy can turn to blocking server farms and bad proxies. Some will choose to block the non-ARIN legacy blocks wholesale while others may find this to be too brutal. I'm personally looking to see if being granular with the Legacy /8s is worth the effort.

bhukkel



 
Msg#: 4663915 posted 3:08 pm on Apr 18, 2014 (gmt 0)

Just spent 25 minutes looking for the LACNIC or AFRINIC versions of this data with no luck. Any clues on where to download these? I love extra data in my Alphabet soup. ;)


I download them from the RIPE ftp server. ftp://ftp.ripe.net/pub/stats/

Look in every subdir for the file delegated-xxxxx-latest

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 4:19 pm on Apr 18, 2014 (gmt 0)

OK, per bhukkel's information and after downloading the file for AFRINIC (from April 14 of this year) and running a distinct query on the country fields in the table, I've determined the following to my own satisfaction. There are not any US or Canadian (my personal definition of "North American") allocations in the following two Legacy blocks which I feel comfortable adding to the original list.

154/8 -- AFRINIC LEGACY
196/8 -- AFRINIC LEGACY

Should have some results for LACNIC shortly.

ADDED: This is going to allow for merging 196/8 and 197/8 in the master list but I'll hold off on that we see what the rest of the Legacy blocks are up to.

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 5:10 pm on Apr 18, 2014 (gmt 0)

They're updating files on RIPE as we speak so the LACNIC results are from today. With some interesting twists.

First, this block doesn't have any US or Canadian country codes in it...

191/8 -- LACNIC LEGACY -- so I'm comfortable adding it to my block list.

Now for the twist. Remember those ALLOCATED blocks? Well the following three LACNIC ALLOCATED blocks have some US-registered blocks in them. They are small ranges and I'm with wilderness on this one. I see them as acceptable collateral damage but you could poke holes for them if you really want to.

200.49.248.0/21 - Telmex USA -- Florida
190.103.184/22 - LAUREN -- New Orleans
179.60.192/22 Edge Network Services Ltd -- California

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 6:20 pm on Apr 18, 2014 (gmt 0)

OK, here's some APNIC Legacy results...the following blocks appear to be free of North American allocations.

043/8 -- APNIC LEGACY
133/8 -- APNIC LEGACY
150/8 -- APNIC LEGACY
153/8 -- APNIC LEGACY
171/8 -- APNIC LEGACY

...and this one has a single exception

163/8 -- APNIC LEGACY -- Exception starting at 163.60.0.0 -- Amada, Corp.

Now, back to the APNIC "Allocated" blocks for a moment. The following exceptions cropped up which include some ranges for Verisign and Microsoft amongst others. These are starting IP addresses and you can look them up in the APNIC Whois if you're interested. I'm frankly fine with blocking these but you may not be.

60.254.128.0
103.246.248.0
113.29.0.0
167.220.224.0
192.103.43.0
202.72.96.0
203.144.48.0
203.187.128.0

bhukkel



 
Msg#: 4663915 posted 6:26 pm on Apr 18, 2014 (gmt 0)

I have a total of 53 US IP ranges not allocated by ARIN, most of them from RIPE.

webcentric

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4663915 posted 6:38 pm on Apr 18, 2014 (gmt 0)

I have a total of 53 US IP ranges not allocated by ARIN, most of them from RIPE.


Sounds like I have about 41 more to go. I purposely left RIPE for (next to) last because I've seen many of those exceptions in the past and figured it for a mess waiting to be exposed. Can't wait to get to ARIN and see what a piece of Swiss cheese that is. ;)

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4663915 posted 7:34 pm on Apr 18, 2014 (gmt 0)

Observations on the above:

Regex can often take longer to parse grouped IP ranges than simply adding the extra IP CIDR ranges to a blocklist and not using regex.

36/7 can be written as 37/7 - depends. Some apps (eg postfix) complain if you do that. You should really work from the base of the /7 (or /15 or whatever).

Leaving off trailing zeros is a dodgy way of working. Again, some apps require them.

I'm pretty sure all ipv4 ranges have been allocated to regions now. Last change took place (if I remember correctly) some time early last year or late year before. This does not mean that regions have allocated all of their allotments yet.

Most regional ranges include at least a few sub-ranges, sometimes as high as /16 or more, used by US companies/services. And vice versa. It may be dangerous, depending on your web site catchment area, to block /8 or larger indiscriminately.

And, of course, if you have world-wide expectations, as some of my own customers do, then all the above is void.

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4663915 posted 7:54 pm on Apr 18, 2014 (gmt 0)

if you have world-wide expectations, as some of my own customers do, then all the above is void

The great firewall of Smallsville dispenses with the first two letters of "www".

If I were one of the millions of Americans living abroad (or one of the many more who might be travelling at any given time) I would not be pleased to be banned from visiting a website, whether it be for shipping to a relative back home, getting support for something I had purchased, or just browsing while stuck in a hotel somewhere.

I would probably look for a better supplier.

...

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4663915 posted 8:39 pm on Apr 18, 2014 (gmt 0)

:: detour to pore over color-coded lists ::

Most exceptions-- like a US registry in a RIPE range --are things like /29 slivers of server farms, which can be happily ignored anyway. I'm frankly surprised at the willpower of LACNIC and AfriNIC in sitting on huge unassigned ranges while RIPE is reduced to doling out /22s (see current allocations in 185). Wouldn't you think you could sell them off for wildly inflated prices?

:: pause to wonder if something is going on in 104 ::

Oh, right. It's the 128-172 area that's the big mishmosh. (Very, very colorful in my records!) When I meet a robot from a nominally academic IP, I immediately assume-- rightly or wrongly-- that it's an assignment for a computer science class. But in my case they do tend to be plausible humans.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4663915 posted 8:53 pm on Apr 18, 2014 (gmt 0)

103.

This 94 message thread spans 4 pages: 94 ( [1] 2 3 4 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved