Forum Moderators: open
208.96.54.zz
208.96.0.0/18 ServePath
Yet Another Spider disco/Nutch-1.0-dev
[webmasterworld.com...]
RewriteCond %{REMOTE_ADDR} ^208\.96\.([0-9]¦[1-5][0-9]¦6[0-3]¦8[0-9]¦9[0-6])\. [OR]
RewriteCond %{REMOTE_ADDR} ^216\.93\.1([6-8][0-9]¦9[01])\. [OR]
RewriteCond %{REMOTE_ADDR}
^64\.151\.(6[4-9]¦[7-9][0-9]¦1[01][0-9]¦12[0-7])\. [OR]
RewriteCond %{REMOTE_ADDR} ^69\59\.(12[8-9]¦1[3-8][0-9]¦19[01])\. [OR]
(in mortal human language)
216.93.160.0 - 216.93.191.255 >> 216.93.160.0/19
So just for the record here are all ranges put together:
64.151.64.0/17
69.59.128.0/18
208.96.0.0/18
216.93.160.0/19
how do you guys find different IP ranges for the same hosting company? Arin.net shows only one of them.
to me, telling my firewall apf -d 208.96.0.0/18 is clean and simple, less things to go wrong with all your [ and ] and God forbid you forget a \. and much easier to read later on, but as I said, I'm a mere mortal.
as an example:
Enter the following minus the quotes;
"> 209.206.128"
then sroll the page and view the results.
The problem with many of these similar seraches is limit of 256-something.
I've had some results fill multiple pages and take many minutes, while others will cut short at a pre-dtermined limit.
As an aside; I've never determined or understood the method of doing these searches at RIPE, however the help does provide that these inquiries are an option.
edited by wilderness!
Hope your not on dialup ;)
"> 67.128."
the following is real "pig Latin"64.151.64.0/17
69.59.128.0/18
208.96.0.0/18
216.93.160.0/19
Correction, the ServePath range is 64.151.64.0/18, you don't want to block /17 or you might whack some legit browsers by accident and a big bunch of them.
OrgName: ServePath, LLC
NetRange: 64.151.64.0 - 64.151.127.255
CIDR: 64.151.64.0/18
NetName: SERVEPATH-BLK4
Besides, it's not igpay atinlay, it's a binary bitmask for a CIDR (Classless Inter-Domain Routing) [en.wikipedia.org] and quite easy to understand!
Since many people have difficulty with these here's a little primer on CIDRs that hopefully simply the concept.
Each part of the IP address is represented by 8 bits (byte) which is 0-255.
Think of the CIDR like this in terms of the binary bitmask: 0-8.9-16.17-24.25-32 therefore you know something ending in a /18 uses the first 2 bits of the C block as the start of all the addresses assigned to that range.
Some examples:
1.0.0.0/8 means the first 8 bits are constant so you're referring to a specific A block.
Therefore, 1.1.0.0/16, 1.1.1.0/24 would refer to a specific B or C block respectively and 1.1.1.1/32 means the entire IP address is used and not a portion.
1.0.0.0/8 represents the range from 1.0.0.0-1.255.255.255.255 or 1.1.0.0/16 represents 1.1.0.0-1.1.255.255, etc.
So 64.151.64.0/18 means that the range is 64.151.64.0-64.151.127.255 or easier to see in binary as:
01000000.10010111.01000000.00000000
so /18 is
01000000.10010111.01nnnnnn.nnnnnnnn
Note that the first 18 bits of the CIDR are fixed and everything after that point is variable. The C block value has a base of 64 meaning that you can only add values of 1-63 to the base part of the C block address making the maximum 127, The D block in the example can be any value from 1-255.
Hope that takes some of the igpay out of the atinlay for those that find CIDR's hard to deal with and the scientific mode of the calculator that comes with windows provides binary to decimal conversions to make it all easier to see and there are some CIDR calculators online that make it even simpler.
IMO, the cutting and pasting the exact CIDR from the ARIN record into a firewall is a lot safer than the mistakes that can be easily make using rewrite rules and I'm a big stickler for spot on accuracy so I use it "as-is" from ARIN or the other respective internet number registries just to avoid potential disasters.
From my own perspective, it's much easier to understand the rewrite lines.
As fars as Hobbs mention of syntax errors?
These are going to occur in any method (i. e., Bill's 17-18 reference).
All these merely shows us all, there's more than one way to skin an htaccess cat ;)
Don
/32 1/256 C 1 D
/31 1/128 C 2 D
/30 1/64 C 4 D
/29 1/32 C 8 D
/28 1/16 C 16 D
/27 1/8 C 32 D
/26 1/4 C 64 D
/25 1/2 C 128 D
/24 1 C 256 D
/23 2 C
/22 4 C
/21 8 C
/20 16 C
/19 32 C
/18 64 C
/17 128
/16 256 C, 1 B
/15 512 C, 2 B
/14 1024 C, 4 B
/13 2048 C, 8 B
/12 4096 C, 16 B
/11 8192 C, 32 B
/10 16384 C, 64 B
/9 32768 C, 128 B
/8 65536 C, 256 B, 1 A
/7 131072 C, 512 B, 2 a
/6 262144 C, 1024 B, 4 A
/5 524288 C, 2048 B, 8 A
/4 1048576 C, 4096 B, 16 A
/3 2097152 C, 8192 B, 32 A
/2 4194304 C, 16384 B, 64 A
/1 8388608 C, 32768 B, 128 A
/0 16777216 C, 65536 B, 256 a
so if I'm blocking 64 class C's all I need to do is look it up and it is /18
true mistakes do happen (thankfully, we got Bill to save the day), but I'm holding on to my remaining gray cells for much more trivial matters (the futile effort to remain married and sane).
Different installations of the Nutch software may specify different agent names, but all should respond to the agent name "Nutch". Thus to ban all Nutch-based crawlers from your site, place the following in your robots.txt file:User-agent: Nutch
Disallow: /
Assuming they still use Nutch and didn't alter it's behavior that should work.
User-agent: Nutch
Disallow: /
I tried that at first a couple years ago. Despite what it says at the Nutch web site, the only one that will follow that disallow directive is Nutch itself.
This is one of the things that irritate me about all these start-up bots. We must disallow each one by name, and half of them do not obey it even then. Been there, done that.
So when I say "just ban 'Nutch'" I an referring to alternate methods, i.e. mod_rewrite, mod_setenvif, etc.
So when I say "just ban 'Nutch'" I an referring to alternate methods, i.e. mod_rewrite, mod_setenvif, etc.
Ah, so you can't "just ban Nutch" then, you have to ban the individual names, what a joke.
See, I wouldn't know about this problem since I whitelist and they're all banned by default so this is good to know for giving advice to the blacklisters.
[edited by: incrediBILL at 4:17 pm (utc) on April 11, 2008]
I vaguely remember some of those old threads but the Nutch code base gets updated all the time and evolves so it would be nice to think that they fix some of these things.
That's why I went to their site to see what they had to say about blocking Nutch in general since things change.
However, with that said, some people still run the old nutch versions so even if the new versions used the generic Nutch label in robots.txt that wouldn't change the behavior of old Nutch implementations already in use which could cause conflicting views about whether Nutch does or doesn't do certain things.
It's just a mess no matter what.
Ah, so you can't "just ban Nutch" then, you have to ban the individual names - incrediBILL
You can't just "disallow in robots.txt" them all with "nutch."
Since robots.txt does not "ban" per say, banning refers to alternative methods. 99% of these clones keep the "nutch" in the UA string, so using a firewall, http config, htaccess, etc is effective for "nutch"
Sorry to say these methods also stops mother nutch unless you add some other allow filters.