Forum Moderators: phranque

Message Too Old, No Replies

URL manipulation and shortcuts

How to do non-canonical URL to dynamic URL using MySQL lookup?

         

rsgalloway

8:08 pm on Oct 30, 2006 (gmt 0)

10+ Year Member



I have a database of users, each of which has a relational ID number, like user-123, and a unique username which is a user defined string that I have no control over. Each user has a profile page, which is displayed like this:

mywebsite.com/profile?id=user-123

The trouble is, these IDs are rather long and hard to remember, so I want to be able to offer a shortcut or alias to the profile pages instead, like this:

mywebsite.com/username

Users are not created as UNIX users, they only exist in my database. So Apache would have to lookup the canonical URL somehow.

Another caveat is all my pages are extentionless scripts and I'm using this directive to execute them:

ScriptAliasMatch ^(/[^/.]+)$ /home/mywebsite.com$1

With this in place, I'm not sure how to tell Apache the difference between an alias to a user profile and a script.

Apache version 2.0.46

***reading the Apache forum section of the WebmasterWorld library now

jdMorgan

10:59 pm on Oct 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you've already identified the main problem: You need to 'tag' the URLs in some manner to identify them as user profiles. The easiest way is to prepend "member" or "profile" or "user" to the userid, and use that to detect the URL-paths which need to be looked-up.

As to the mechanics of the lookup, one solution is to use mod_rewrite's RewriteMap directive to invoke a sever script (I suggest PERL) that can open the database and retrieve the URL-path associated with the 'friendly' URL, using that friendly URL as the lookup key. The request is then rewritten to that URL, serving the correct file seamlessly and without changing the browser's address bar.

RewriteMaps can only be defined in the server configuration (httpd.conf, conf.d, etc.), and not in per-dir (.htaccess) context. However, once defined, RewriteMaps can be called from .htaccess if desired.

Another method is to let your main script do this, and simply pass all except for 'infrastructure URL' requests to your main script for processing. The script would then 'include' whatever file was requested and output it in the server response (with proper MIME-type and Cache-control headers). By 'infrastructure URLs,' I mean things like robots.txt, /w3c/p3p.xml (or .rdf), and perhaps image, CSS, and external JavaScript file requests -- all depends on how your site is laid out today.

Jim

rsgalloway

11:22 pm on Oct 30, 2006 (gmt 0)

10+ Year Member



Hi Jim, thanks for the reply.

I'm a little confused as to the first sentence, I think you meant to write that I need to tag the username in some way, because userids would not be present in the static URLs. URLs entered into the browser would take the form

www.mywebsite.com/username

which would have to be differentiated somehow from executable scripts which would have a very similar URL / path. Most userids (except for a handful of early ones) are prepended with "user-" and then followed by a string. I should clarify that userids are for internal use only, in the relational database. Users are only aware of their username, generally.

I'm now thinking that a reasonable compromise would be to simply add a /profiles/ into the URL, like this

www.mywebsite.com/profiles/username

and use RewriteRule to load a dynamic page, like this

www.mywebsite.com/profile?u=username

What do you think about that? Is this too easy? How would the rewriterule be written?

(frankly, I could see a lot of use for something that queries the database for anything entered after the trailing slash of the TLD, but maybe it's not worth the amount of effort required)

Thanks.

jdMorgan

12:42 am on Oct 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, you need to tag your "user" URLs so that they can be uniquely identified for rewriting.

The rewrite you describe should work. However, we cannot support code-writing requests here -- the demand would simply be too large and the available contributors too few. However, we will be happy to discuss your code or help with specific problems, as those activities comport with the definition of a 'discussion' forum. See our forum charter (link at top left of page) for more information, and links to resources to get started.

Also, the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com], and the results of a site search [webmasterworld.com] on WebmasterWorld may be quite helpful.

Jim

rsgalloway

1:14 am on Oct 31, 2006 (gmt 0)

10+ Year Member



I want to put this in my httpd.conf file, so for my example above

RewriteEngine on
RewriteRule ^/profiles/([^/]+)/?$ /profile?u=$1 [L]

should rewrite

[mywebsite.com...]

to

[mywebsite.com...]

the script 'profile' in the dynamic URL executes a bit of SQL to find the userid associated with <username> and displays the appropriate content.

And this rewrite should handle any old dynamic URLs

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /profile\?u=([^&]+)\ HTTP/
RewriteRule ^/profile$ [mywebsite.com...] [R=301,L]

Does that look about right?

jdMorgan

1:45 am on Oct 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That looks good to me -- Well done!

Be aware that as far as the browser is concerned, the current URL base path is still "/profiles", so on-page links and includes will either need to be <a href="../path-to-file">, <img src="/path-to-file"> or the canonical <a href="http://www.example.com/path-to-file"> in order to work properly. That is, any page-relative links will need to be adjusted to compensate for the fact that it is the browser that resolves relative links, and since we're using a server-internal rewrite, the browser still thinks that the current base directory is "/profiles".

An alternative is to catch those URLs that need to be adjusted, and rewrite them, too. And in fact, you may already be doing so.

I suspect you've already run into this, but I wanted to include it for completeness.

Jim

rsgalloway

2:33 am on Oct 31, 2006 (gmt 0)

10+ Year Member



Ran into a bit of trouble on implementation. I put this exact code (domain name changed) into httpd.conf

DocumentRoot /home/mydomain.com
ServerName www.mydomain.com

RewriteEngine on
RewriteRule ^/profiles/([^/]+)/?$ /profile?name=$1 [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /profile\?name=([^&]+)\ HTTP/
RewriteRule ^/profile$ [mydomain.com...] [R=301,L]

ScriptAliasMatch ^(/[^/.]+)$ /home/mydomain.com$1

And I get the expected URL rewrite behavior, but instead of the profile script executing, it dumps out the source code!

update: if I move the rewrites inside the directory container, I get a 404. httpd log says apache is looking for a script called 'profiles'

jd01

3:04 am on Oct 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



New guess.

I would check to see if you are passing/dropping the name variable in the rewrite condition.

Justin

jdMorgan

3:46 am on Oct 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The 404 behaviour is because your ScriptAlias is kicking in.

The code dump means that the AddHandler server-parsed for filetype .php isn't being invoked.

Part of the puzzle is getting the rewrite code right. The other parts are putting the redirect code in the right context (<ServerName>, <Directory> or .htaccess), invoking the right content-handler (e.g output it as text/html or execute it as a server-side script), and setting proper MIME-Type, Cache-Control and Expires headers. So, you've got several variables at play here. As an outsider not familiar with how your site is structured, I can't really tell you what's wrong, except in these general terms.

Also, be aware that the order of your mod_rewrite directives in relation to your ScriptAliasMatch directive in the file won't change the order in which those directives are executed; The order of execution will be the reverse of the LoadModule list order in Apache 1.x, or controlled by an internal priority scheme in Apache 2.x.

So, directives handled by any one given module are executed in the order that you specify, however, the server controls the order in which various modules are invoked to process your directives. Therefore you cannot explicitly control the order of processing of any two directives handled by different modules.

If you suspect that the ScriptAliasMatch is interfering with your mod_rewrite rules, then you may want to use the [T] flag, replacing ScriptAliasMatch with a RewriteRule so that you can control invocation order. e.g. RewriteRule ^(/[^/.]+)$ /$1 [T=application/x-httpd-php] (this is probably not exactly right, it's just an example of what I'm talking about).

Jim

rsgalloway

4:26 am on Oct 31, 2006 (gmt 0)

10+ Year Member



Ok, thanks again for your help.

Firstly, it'd be great to get the internal rewrite/redirect working intially. Here's what I have now (this is in the ServerName)

RewriteEngine on
RewriteRule ^/profiles/([^/]+)/?$ [mydomain.com...]

(note the lack of flags at the end, no [L] or [R])

The result is a redirect, not a transparent rewrite. ie. the URL in the browser changes.

What I've read (albiet not that much) says this should be an internal redirect and shouldn't affect the URL in the browser, serving the virtual page as if it actually existed.

jdMorgan

4:59 am on Oct 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's an external redirect because you've included the protocol and the domain, i.e. http://www.example.com forces a redirect. If you leave that off and specify a local path only, then you'll get an internal rewrite.

I'd advise a quick review of the Apache RewriteRule documentation [httpd.apache.org], since one little typo may leave your server ... "non-optimal"

Jim

rsgalloway

4:54 pm on Oct 31, 2006 (gmt 0)

10+ Year Member



Ahh, I see. OK, I've changed it to a local path

RewriteEngine on
RewriteRule ^/profiles/([^/]+)/?$ /profile?name=$1

Now with the ScriptAliasMatch in place

ScriptAliasMatch ^(/[^/.]+)$ /home/mywebsite.com$1

it seems two things happen:

1. if I put the rewrite in the ServerName area, the code is dumped (script alias not invoked), or

2. if I put the rewrite in the Directory area, scriptalias is invoked first and looks for a script called 'profiles'

I thought that the rewrite would be invoked for any URL with /profiles/ in it, but it seems the scriptalias is taking precedence.

I'm using extentionless python scripts, btw, hence the need for the scriptaliasmatch. Maybe something like

RewriteEngine on
RewriteRule ^/profiles/([^/]+)/?$ /profile?name=$1 [T=application/x-http-python]

? That doesn't seem to work, though. My browser tries to download the script.

I'll keep hacking away at this, but if anyone has any ideas I'm all ears. Thanks.

rsgalloway

6:40 pm on Oct 31, 2006 (gmt 0)

10+ Year Member



Ok, I got it to work, sort of, by putting the rewrite in the ServerName area and using the T flag

[T=application/x-httpd-cgi]

But there must be something wrong with my regex because if I leave off the trailing slash in my browser,

[mydomain.com...]

my browser tries to download a file called <username>. Add the trailing slash and it works fine. The thing that doesn't make sense is, I added a second rewrite (same regex) which doesn't care about the trailing slash (works with and without it)

RewriteRule ^/profiles/([^/]+)/?$ /profile?n=$1 [T=application/x-httpd-cgi]
RewriteRule ^/blogs/([^/]+)/?$ /profile?b=$1 [T=application/x-httpd-cgi]

Can anyone see any reason why the top one requires a trailing slash, but the bottom one does not?

Thanks again.

jd01

7:19 pm on Oct 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not seeing it, but haven't read the thread enough to see the whole picture of what you are doing / using. Actually just wanted to make sure you added a [T=stuff,L] 'Last' flag if you do not need to apply other rules. You can cause excessive processing and get some odd results if you omit them.

Justin

rsgalloway

1:22 am on Nov 1, 2006 (gmt 0)

10+ Year Member



Thanks, Justin. I added the L flag to each of my RewriteRules, but the odd behavior is still there.

jdMorgan

1:46 am on Nov 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's quite obvious that neither rule requires a trailing slash, so look elsewhere for the problem.

Again, it's likely that your ScriptAlias directive (or some other Alias, Redirect, RewriteRule, or ProxyPass class of directive) is interfering.

Jim