Forum Moderators: phranque

Message Too Old, No Replies

Using a colon in .htaccess regex

         

cartographer

7:47 pm on Jan 17, 2010 (gmt 0)

10+ Year Member



I'm working on a little gallery project for myself and one of the things I'm going for is short URLs - for just about everything. For example, something like this: imag.es/5fj38.

Because of this, I'm then defining categories and other pages like this: imag.es/cat:flowers

Because the project is fairly small, all links are directed through gallery.php?id=[query], where the query passed is parsed and turned into the category/image/page that it's intended to go to. This works fine.

However I haven't managed to find a regular expression for .htaccess that will successfully work with a colon. URLs without a colon work correctly, but as soon as a colon is introduced it ends up returning a 403 Forbidden error.

My current code is this:

RewriteRule ^([-_A-Za-z0-9]+)/?$ gallery.php?id=$1 [NE,NC]

Thanks in advance.

g1smd

7:53 pm on Jan 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not sure colons are allowed in URLs.

Defer to the list of allowed characters shown in the HTTP specs.

If you really have to include one, it has to be encoded.

cartographer

8:03 pm on Jan 17, 2010 (gmt 0)

10+ Year Member



I would think they are allowed somehow. Wikipedia uses them on a number of pages; and they work fine in query strings too...

jdMorgan

9:14 pm on Jan 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The allowed-character 'rules' for URL-paths and the query strings appended to those URL-paths are quite different. Colons are reserved for use as delimiters in URL-paths and so must be URL-encoded if they are used for anything except the protocol-designated purpose in the URL. There's no use trying to debate this in a forum. See RFC-3986.

I presume you've tried including an escaped colon in your alternate-character group in the regex above, so you may end up having to examine "THE_REQUEST" using a RewriteCond, testing for your existing regex group OR \%3[Aa] in order to match and back-reference URL-encoded colons; The URL-path seen by RewriteRule itself will have already been decoded.

Jim

cartographer

7:28 pm on Jan 19, 2010 (gmt 0)

10+ Year Member



Okay, thanks for the info! I'll look for an alternative method.

whoisgregg

7:40 pm on Jan 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you're thinking about using semicolons, be aware that those wreck havoc with Google Analytics.

jdMorgan

2:34 am on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, in the URL-path-part of the URL -- the part that follows the domain name and precedes the query string and framgent/named-anchor, it is best to stick with the very-restricted set of "unreserved" characters listed in the RFC I cited above.

If you use any "reserved" or "unwise" characters, it may in fact work fine for you right now, but then you may hit a brick wall such as this Google Analytics "semicolon problem" at some time in the future.

"If it's expeditious, there's probably a catch" is a good phrase to keep in mind.

Jim