Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite with optional name/value pairs

This is frying my head

         

renaissanceman

10:37 am on Mar 23, 2009 (gmt 0)

10+ Year Member



I have a site where I'm trying to create multiple pages with geographically-targeted dynamic content, all loaded from index.php with a friendly URL, and I can't seem to get the rewrite rules to do exactly what I want them to. I only have access to .htaccess on this one, and I can't set up rewrite logging, so I'm stabbing wildly in the dark here.

Setup: wildcard subdomains have been set up in the virtualhost container and DNS zone. I have two optional name/value pairs that I want to write into the URL, with the first one (if present) mapping like so:

$name1.domain.co.uk = domain.co.uk/?$var1=$value1

and the second:

$name1.domain.co.uk/$name2 = domain.co.uk/?$var1=$value1&$var2=$value2

The second (virtual directory) rule only needs to kick in if the first (virtual subdomain) is present, since it's a hierarchy and there's the faint possibility that there might be duplicate values for $name2, although never within the same $name1. I apologise if this isn't very clear, I'm trying to abstract it to clear my head and make it easier for someone who isn't involved to understand it. If it helps, though, I can provide an applied example:

$var1 is county, so picking the last one I tested, we have:

bedfordshire.domain.co.uk should map to domain.co.uk/?county=001

$var2 is town/city, so:

bedfordshire.domain.co.uk/ampthill should map to domain.co.uk?county=001&town=0000000001

It's entirely possible that there might be a town called Ampthill in a different county as well, so ideally I want to check the existence of a county first and then check for each possible town in that county in the URL, but it shouldn't match a town where no county is specified. I can't set up a rewritemap to take care of all these relationships, but as long as I can get the patterns right I can generate the rules in .htaccess with PHP.

I can match the subdomain part with the following:


RewriteCond %{HTTP_HOST} ^bedfordshire [NC]
RewriteRule ^(.*)$ $1?county=001 [NC,L]

That works fine, but adding any extra rule after that to check the town throws me into loops. If I do this:


RewriteCond %{REQUEST_URI} ^/ampthill [NC]
RewriteRule ampthill ?town=0000000001 [NC]
RewriteCond %{HTTP_HOST} ^bedfordshire [NC]
RewriteRule ^(.*)$ $1?county=001 [NC,L]

It seems to still match the county, but not the town (I'm aware tht this reverses the GET variables, which I can deal with if necessary but it isn't ideal, because $town is dependent on $county, so it seems wrong to me to have the string in the format ?town=0000000001&county=001).

For the moment this only needs to work with index.php, although it would be nice if I could make it work with any possible "real" file or directory (possibly with other querystring variables) without screwing up the rewrite.

Is there anyone out there who can rescue me? I'm not fussy, I'll take help from anyone but the closer you are to South Wales (UK), the more likelihood I may eventually get to buy you due beer for your trouble.

g1smd

10:52 am on Mar 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I assume the 'numbers' are entries in a database.

I wouldn't do this trying to rewrite names to numbers. That's because I would not want to have hundreds of rules in the .htaccess file. I want one rule (maybe two) that covers everything all at once. Easier to set up and much less maintenance.

I would rewrite requests for county.example.com/ and county.example.com/town to the script without translating them to numbers. The script would then look at the database to get the data, based on extracting the words from the requested URL. I wouldn't use numbers at all.

renaissanceman

11:31 am on Mar 23, 2009 (gmt 0)

10+ Year Member



The problem is that the names may contain spaces, so the only ways to deal with that are to either URL-encode or strip them out. URL encoding would be a two-way process, but would make them less human-readable and easy to type, whereas stripping out the spaces makes it possible for someone to logically type their location as (taking the example of Bexhill on Sea in the county of East Sussex) eastsussex.domain.co.uk/bexhillonsea rather than expecting the user to know that spaces can be replaced with + (most users would not be aware that spaces can be encoded in this way).

Edit: I'm really not worried about the number of rules generated or with maintenance, as the rules will be generated programmatically, and if I set it up right I surmise that only the rules for each county will be parsed for a url containing that county.

g1smd

12:00 pm on Mar 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sure, the names may contain spaces, but they are entries in a database. The URLs don't have to contain spaces - indeed they should not do so, and it would be easy to set it up so that URLs in links contained hyphens, or plus signs, or whatever you want.

In forms, you would type spaces, but the form script would then convert what was typed in, before applying it.

Whatever you do, do think about applying a 'did you mean...?' function that lists alternative locations when there are several with similar spelling, or for when no results were returned for what was actually input.

renaissanceman

12:04 pm on Mar 23, 2009 (gmt 0)

10+ Year Member



Ok, so I've managed to get it working by marginally adapting my previous attempt, thus:


RewriteCond %{REQUEST_URI} ^/?ampthill [NC]
RewriteRule ampthill ?town=0000000001 [NC,QSA]
RewriteCond %{HTTP_HOST} ^bedfordshire [NC]
RewriteRule ^(.*)$ $1?county=001 [NC,L,QSA]

I've tested it with and without trailing slash and extra GET variables and it seems to work, but it still has the "problem" - I don't know if anyone else will see it as such - of the first (required in the presence of a second) variable appearing in the URL after the (optional) second.

It also gives me a 500 error when a typo is made in the name of the town, where I'd ideally prefer it to fall back to the county if at all possible.

g1smd

12:12 pm on Mar 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With a rewrite, each rule must end with [L], otherwise processing will continue through all the later rules too.

A rewrite should respond to a URL with, OR, without a trailing slash, but NOT both. To do so, creates duplicate content.

If you want both to work, set up a redirect to remove the trailing slash, before the rewrite kicks in. Force the correct domain in that same redirect at the same time too. That means both will still allow access to the content but only one will be 'seen' as being indexable by search engines.

renaissanceman

12:17 pm on Mar 23, 2009 (gmt 0)

10+ Year Member



Damn, didn't see your reply there; my email's running a bit slow today. The problem I had with the spaces is that I have to account for people typing the address directly in the address bar, which would provide no way of tracking where the spaces originally would be. Also, the name of the town cannot be used as the identifier because it's not a unique key in the database - I have fields for county_id in the counties table and town_id and child_of in the towns table to keep track of locations. Yes, in theory it would be particularly stupid for duplicate place names to pop up within the same area, but I've known worse cases of municipal idiocy in my time so I'd prefer to use my own unique identifiers. Eventually I'm hoping that the same schema can be extended even more, so if we launched something similar in the US, we could be as specific as:

state.domain.com/county/city/street

g1smd

12:21 pm on Mar 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Have a look at the BT phonebook website or any rail or coach booking site to see how they treat similar names.

You will find that many of your visitors don't know what county the place they want is in. It is up to your site to prompt them, and give a list of choices.

Approach this from the point of view of the visitor that isn't exactly sure what they want. Cater for their needs and you have it licked.

renaissanceman

12:29 pm on Mar 23, 2009 (gmt 0)

10+ Year Member



Damn, didn't see your reply there; my email's running a bit slow today. The problem I had with the spaces is that I have to account for people typing the address directly in the address bar, which would provide no way of tracking where the spaces originally would be. Also, the name of the town cannot be used as the identifier because it's not a unique key in the database - I have fields for county_id in the counties table and town_id and child_of in the towns table to keep track of locations. Yes, in theory it would be particularly stupid for duplicate place names to pop up within the same area, but I've known worse cases of municipal idiocy in my time so I'd prefer to use my own unique identifiers. Eventually I'm hoping that the same schema can be extended even more, so if we launched something similar in the US, we could be as specific as:

state.domain.com/county/city/street

Once we got to that stage, we'd definitely need it to be bulletproof so we could deal with things like (dredging this out of memory slightly) the city of Charlotte, which has two Queens Roads that even intersect each other.

jdMorgan

1:14 pm on Mar 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One rule to map all subdomains (except "www") and all top-level directories (except those that exist as physical directories) to your script at "/"

RewriteCond %{HTTP_HOST} !^www\.example\.co\.uk [NC]
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.co\.uk [NC]
RewriteCont %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /?county=%1&town=$1 [L]

If a translation from counties and towns to numbers is required, then replace the back-reference variables in the rewriterule with calls to your proposed RewriteMap, passing those variables to the map for translation. As g1smd suggested, however, this placenme-to-number translation (plus the space-stripping) would best be done in your script (or in an additional 'wrapper acript' around that script); PHP and other server-side scripting languages have far more efficient methods to deal with string manipulations than mod_rewrite can support.

Jim

renaissanceman

1:25 pm on Mar 23, 2009 (gmt 0)

10+ Year Member



Hi Jim,

Cheers for the reply, and the suggestion. Since you're a mod, any idea how we can clean up the thread? I went to edit an earlier post, and somehow (possibly because the other poster replied while I was editing) I ended up double-posting it, once with the edit and once without.

g1smd

1:33 pm on Mar 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim can do that on his next visit no doubt <delete this when done>.
Post #3876886 looks like the one that is no longer needed.