homepage Welcome to WebmasterWorld Guest from 54.226.213.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 94 message thread spans 4 pages: < < 94 ( 1 2 3 [4]     
Block non-North American Traffic for Dummies Like Me
Reducing the size of your blocking list.
webcentric




msg:4663917
 6:48 pm on Apr 17, 2014 (gmt 0)

First off, this subject has been discussed before but I felt that there's enough current interest in this board and on other boards here at WebmasterWorld alone, to warrant a fresh top-down discussion of the subject. We'll see if our moderators agree.

The list of CIDRs below was compiled from the Iana IPv4 Address Space Registry report [iana.org]. The list is a compact version of all Allocated non-ARIN /8 blocks (from APNIC, RIPE NCC, AFRINIC, and LACNIC). For example, 58.0.0.0/7 actually merges 58.0.0.0/8 and 59.0.0.0/8 into a single CIDR. The largest block in this list is 80.0.0.0/4 which merges the 80.0.0.0 through 95.255.255.255 address range.

Some of the CIDR's below merge blocks from different registries e.g. combining blocks from both RIPE NCC and APNIC. As such, this does not in any way represent an approach surgical enough to differentiate blocks in one RIR from blocks in another (let alone blocks representing specific countries). The goal here is to arrive at a blocking strategy that keeps people and bots from outside North America off your site.

It should also be noted that the list below is only intended as a good first step where blocking is concerned. There are many holes in the Legacy blocks that this step does not address and proxies are another whole topic of ingress. The intention here is to succinctly narrow the scope of the task with as little effort as possible.

One tangible benefit of this approach can be seen in the 176.0.0.0/5 range which blocks
176.0.0.0 to 183.255.255.255. This CIDR contains some AWS and Rackspace ranges (and probably other server farms as well). Blocking this range means you don't have to identify and separately block those server farm ranges.

1.0.0.0/8
2.0.0.0/8
5.0.0.0/8
14.0.0.0/8
27.0.0.0/8
31.0.0.0/8
36.0.0.0/7
39.0.0.0/8
41.0.0.0/8
42.0.0.0/8
46.0.0.0/8
49.0.0.0/8
58.0.0.0/7
60.0.0.0/7
62.0.0.0/8
77.0.0.0/8
78.0.0.0/7
80.0.0.0/4
101.0.0.0/8
102.0.0.0/7
105.0.0.0/8
106.0.0.0/8
109.0.0.0/8
110.0.0.0/7
112.0.0.0/5
120.0.0.0/6
124.0.0.0/7
126.0.0.0/8
175.0.0.0/8
176.0.0.0/5
185.0.0.0/8
186.0.0.0/7
189.0.0.0/8
190.0.0.0/8
193.0.0.0/8
194.0.0.0/8
195.0.0.0/8
197.0.0.0/8
200.0.0.0/7
202.0.0.0/7
210.0.0.0/7
212.0.0.0/7
217.0.0.0/8
218.0.0.0/7
220.0.0.0/7
222.0.0.0/7

So, I'm hoping that

1.This list is helpful to those looking for a starting point
2.That, if there's a mistake in the list above, that the moderators will see fit to correct the list when the mistake is identified so that the first post can reflect accurate and up-to-date information.
3.That this discussion can move forward with new ranges outside the Allocated blocks to help expand this list even further. Anyone want to block the UK Ministry of Defence (sic)? That /8 block and others are omitted here in this initial list because they are Legacy blocks.

And last for now. It is possible to further reduce the above list to a series of Regular Expressions which would be even more condensed than the list above. For those with access to a rewrite module (Apache or IIS) this list would be valuable but I'll leave up to an expert in that arena to post the list if they care to. I hope this helps someone and can save them the time I (and many others) have spent whittling down the world a bit.

Comments and corrections are most welcome!

 

graeme_p




msg:4666022
 6:16 am on Apr 26, 2014 (gmt 0)

Two questions for ecommerce sites who do not deliver abroad:

1) Does you order process allow people to enter a foreign credit card billing address? There is at least one British site I would have bought a few hundreds of pounds worth of stuff from if they did.

2) If the answer to 1) is yes, do you know how many people on with foreign billing addresses bought from you? If you use a payment processor do you even have the data?

3) If the answer to 1) is no. Why not?

Also, does anyone have any numbers on the rate of false positives.

As far as language is important, this makes interesting reading:

[en.wikipedia.org ]

Although I can see some of the numbers are badly off the overall picture is correct.

jswap




msg:4683022
 2:32 pm on Jun 26, 2014 (gmt 0)

webcentric, thank you very much for this list. It has been extremely helpful to me in reducing the amount of time I have had to spend battling spammers and scrapers.

I found another range that I think should probably be included:
144.0.0.0/8

[edited by: incrediBILL at 6:21 pm (utc) on Jun 26, 2014]
[edit reason] thread cleanup, see TOS #4 [/edit]

wilderness




msg:4683082
 6:25 pm on Jun 26, 2014 (gmt 0)

FWIW, the 144 Class A includes some Aussie-Kiwi ranges.
EX:
^144\.(13[013-9]|160)\. [OR]

jswap




msg:4683085
 6:38 pm on Jun 26, 2014 (gmt 0)

Per the OP, these rules are for blocking non-North American traffic.

This 94 message thread spans 4 pages: < < 94 ( 1 2 3 [4]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved