Welcome to WebmasterWorld Guest from 34.231.247.139

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Rewrite valid IPv6 address

     
4:06 pm on Jun 22, 2015 (gmt 0)

Full Member from US 

10+ Year Member

joined:May 16, 2006
posts: 295
votes: 3


I have the current IPv4 regex rewrite in use:

RewriteRule ^/example/\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b$ /example.php?ip=$1.$2.$3.$4 [L]

I'm looking to build the same end result line for IPv6 (both compressed and expanded) addresses.

/example/{valid ipv6 address} -> /example.php?ip={valid ipv6 address}

The following appears to be correct for validating an IPv6 address

# IPv6 RegEx
(
([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}| # 1:2:3:4:5:6:7:8
([0-9a-fA-F]{1,4}:){1,7}:| # 1:: 1:2:3:4:5:6:7::
([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}| # 1::8 1:2:3:4:5:6::8 1:2:3:4:5:6::8
([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}| # 1::7:8 1:2:3:4:5::7:8 1:2:3:4:5::8
([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}| # 1::6:7:8 1:2:3:4::6:7:8 1:2:3:4::8
([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}| # 1::5:6:7:8 1:2:3::5:6:7:8 1:2:3::8
([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}| # 1::4:5:6:7:8 1:2::4:5:6:7:8 1:2::8
[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})| # 1::3:4:5:6:7:8 1::3:4:5:6:7:8 1::8
:((:[0-9a-fA-F]{1,4}){1,7}|:)| # ::2:3:4:5:6:7:8 ::2:3:4:5:6:7:8 ::8 ::

fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}| # fe80::7:8%eth0 fe80::7:8%1 (link-local IPv6 addresses with zone index)
::(ffff(:0{1,4}){0,1}:){0,1}
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])| # ::255.255.255.255 ::ffff:255.255.255.255 ::ffff:0:255.255.255.255 (IPv4-mapped IPv6 addresses and IPv4-translated addresses)
([0-9a-fA-F]{1,4}:){1,4}:
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]) # 2001:db8:3:4::192.0.2.33 64:ff9b::192.0.2.33 (IPv4-Embedded IPv6 Address)
)


I would really only need to support the first 9 lines.
8:55 pm on June 22, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15804
votes: 845


It would be awfully nice if Apache recognized the \h locution but so far it doesn't seem to :( This is about matching type-ins, right? So you have to allow for forms like "257" that wouldn't occur in a REMOTE_ADDR or similar condition?

<tangent>
/\b25[0-5]

That's redundant. Since / is a non-word character, /\w = /\b\w by definition.
</tangent>

([0-9a-fA-F]{1,4}:)

Would something like "ABC" by itself be legitimate? Or does the php page supply leading zeros as necessary so you don't need to deal with them at this stage
{3,3}

Double-check that Apache requires this form. Most RegEx engines will accept {3} alone and I had the impression Apache did too. Matter of fact I'm pretty sure I'm currently using it somewhere.

Anyway: What, exactly, is the issue? You can have at least nine numbered captures, so even if you need extras (the {7,7} bit), a lightly modified form of the original ?ip=$1.$2.$3.$4 seems like it would work.

Also, do you need to pull out the exact pattern :: for special handling? If you said {,6} (Apache may require {0,6}, I forget) instead of {1,6} you'd also cover the case of a null element, which gives you :: (also ::: and :::: but that's a different question).
9:35 pm on June 22, 2015 (gmt 0)

Full Member from US 

10+ Year Member

joined:May 16, 2006
posts: 295
votes: 3


Lucy24,

The current IPv4 rewrite works just fine and I'm not looking to make any changes to it. I included it as an example of what I am already doing. Validating and breaking into 4 numbered captures (or 8 in the case of IPv6) is not the way it needs to be handled. It's just how it was previously done.

This is about URL type ins only. Submitted forms can easily be handled in PHP without RegEx. Leading zeroes do not need to be included.

Unfortunately since I am almost entirely unfamiliar with RegEx I'm not sure that I understand your questions/comments.

IPv6 addresses in long form are represented as eight sets of four hexadecimal digits separated by colons. IPv6 addresses can be written in short hand using two conventions:

Zero Suppression
all IPv6 address segments are 16 bits
The leading zeroes in each segment can be left out of the address segment.

Zero Compression
Since all addresses contain 8 segments, following sections of zeroes can be collapsed to a double colon.

For example IPv6 address 2001:0dba:0000:0000:0000:0000:0001 collapses to 2001:dba::1.

:: is valid but ::: and :::: are not valid IPv6 addresses.

I do not need to support the following:

fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}| # fe80::7:8%eth0 fe80::7:8%1 (link-local IPv6 addresses with zone index)
::(ffff(:0{1,4}){0,1}:){0,1}
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])| # ::255.255.255.255 ::ffff:255.255.255.255 ::ffff:0:255.255.255.255 (IPv4-mapped IPv6 addresses and IPv4-translated addresses)
([0-9a-fA-F]{1,4}:){1,4}:
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]) # 2001:db8:3:4::192.0.2.33 64:ff9b::192.0.2.33 (IPv4-Embedded IPv6 Address)
)
11:13 pm on June 22, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15804
votes: 845


I went back and attacked the question from the other side, beginning with "what is a valid IPv6 address". Hexadecimal: check. Omit leading zeros: check. Consecutive all-zero blocks collapse to "::" :check. (About \h: Some RegEx engines let you use this shorthand for "hexadecimal"; it's exactly what you have in all your patterns as [0-9a-fA-F] although here you might as well use the [NC] flag to save space. Not to be confused with \h for "horizontal whitespace", no relation.)

The complication is that "::" can be used only once-- but it can be anywhere in the address, and it can represent any number of blocks. This means that once there's a :: then the total number of blocks may be anywhere from 1 to 7.

If you call each iteration of [\da-f]{1,4} "N" then all of these are legitimate:
N:N:N:N:N:N:N:N
N:N:N:N:N::N
N::N
N:N::N:N:N
N::
et cetera et cetera. In English it's simple: the number of Ns before the :: plus the number of Ns after the :: can add up to no more than 7, while if there is no double :: then the number has to be exactly 8. But Apache can't do this arithmetic. You'd have to set up eight patterns:
N::N(0-6 times) OR
N:N::N(0-5 times) OR
N:N:N::N(0-4 times) OR
etcetera up to
N:N:N:N:N:N:N:N (the version with no :: contraction)

In other words, assuming the IP part comes after /example/ in the URL as in your first post:
RewriteCond %(REQUEST_URI} ^/example/[\da-z]{1,4}::(([\da-z]{1,4}:){0,5}[\da-z]{1,4})?$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/example/([\da-z]{1,4}:){2}:(([\da-z]{1,4}:){0,4}[\da-z]{1,4})?$ [NC,OR]
{ etcetera with the first bit counting up while the second bit counts down, and remember that {0,1} can be reduced to ? alone }
RewriteCond %{REQUEST_URI} ^/example/([\da-z]{1,4}:){6}:([\da-z]{1,4})?$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/example/([\da-z]{1,4}:){7}(:|[\da-z]{1,4})$ [NC]
RewriteRule ^example/[\dA-Za-z]{1,4}: /rewrite-target-here [L]
Note that the rule itself can not have the [NC] flag because here you need to constrain the non-IP part of the URL, "example" or whatever. But the body of the rule has to contain as much URL as possible so the server knows when it's worth evaluating conditions at all. Replace $ with /$ or /?$ depending on exact URL pattern. There obviously has to be an anchor of some kind, not just \b.

Aside: I detoured to do some experimenting, and confirmed that {2} for "exactly this many times" is fine. And it saves you two bytes in each line. Similarly you can probably say {,4} for {0,4} but let's play it safe unless you're on very good terms with your server.

At this point the obvious question becomes: Is it really most efficient to have this preliminary testing done in Apache? Wouldn't it be easier to feed the IP -- whether legitimate or not, so long as it vaguely looks like the beginning of an IPv4 or IPv6 -- straight to your php, and let it say whether the IP is legal or not? Unlike Apache, php would have no trouble with this counting-on-your-fingers level of basic arithmetic. Unfortunately, php doesn't do \h = hexadecimal; it prefers \h = horizontal white space. (I just looked this up. Darn it all.)
12:05 am on June 23, 2015 (gmt 0)

Full Member from US 

10+ Year Member

joined:May 16, 2006
posts: 295
votes: 3


I think your last paragraph probably makes sense. PHP has several filters to detect valid IPv4 and IPv6 addresses so it's trivial.

So, is this what you are saying?

RewriteCond %(REQUEST_URI} ^/example/{simple IPv4 detection}$ [NC,OR]
RewriteCond %(REQUEST_URI} ^/example/{simple IPv6 detection}$ [NC,OR]
RewriteRule ^/example/?$ example.php?ip=$1 [L]


{simple IPv4 detection} would be something like: (?:[0-9]{1,3}\.){3}[0-9]{1,3}
{simple IPv6 detection) would be something that checks for a-f/0-9 and has between two and 7 :'s

And then from there PHP would handle invalid IP addresses and serve the appropriate error message or redirect...
1:55 am on June 23, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15804
votes: 845


I wouldn't even make it a RewriteCond. You can put it straight into the body of the rule. One rule for IPv4, another for IPv6. If you do stick with conditions, then of course the rule itself can't say
^/example/?$
because, first, you wouldn't have a leading / (unless this is happening loose in the config file, which seems unlikely), and second, there obviously will be stuff after /example/. But that was probably an over-hasty cut-and-paste.

F'rinstance:
RewriteRule ^example/((?:\d{1,3}\.){3}\d{1,3})/?$ /example.php?ip4=$1 [L]
and
RewriteRule ^example/(([\da-fA-F]{1,4}:+)+[\da-fA-F]{1,4} /example.php?ip6=$1 [L]

Sure, why not pass different parameter names to the php while you're at it. "If it's this way test for IPv4, if it's the other way test for IPv6". But at this point it becomes more a matter of personal coding style.

Oh, and don't say [OR] at the end of your last Condition! But that was probably a typo too.

:: now back to dispiriting task of trying to dislodge frightful earworm by listening to infinite loop of Offenbach's Can-Can (with breaks every half-hour or so to see if it's gone yet) ::
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members