Forum Moderators: phranque

Message Too Old, No Replies

Cleaning URL with mod_rewrite

         

Etruscan

11:39 pm on May 18, 2006 (gmt 0)

10+ Year Member



My URL rewriting doesn't appear to be processing by the server. I'm trying to get this ghost URL:

http://www.example.com/article/This-is-a-test

Redirected to this URL:

http://www.example.com/article.html?id=This-is-a-test

...and here's the rule I'm using:

RewriteRule ^article/([^/.]+)$ article/$1/ [R]
RewriteRule ^article/([^/.]+)/?$ article.html?id=$1 [L]

I have a feeling it's my regexp, but being new to it - I'm not sure. Anybody have any suggestions?

[edited by: jdMorgan at 2:20 am (utc) on May 19, 2006]
[edit reason] example.com [/edit]

jdMorgan

2:06 am on May 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This code does a 302-Moved Temporarily redirect to add a trailing slash, unless the URL contains a "." or already has a trailing slash. However, this redirect is deferred, and it then drops into the next rule, which (if the URL contains no ".") either does an internal rewrite to the script if the requested URL already had a trailing slash, or it rewrites the URL and then does the external 302 redirect (exposing the script URL) if the slash was missing.

If you are trying to fix search engine links by adding the trailing slash, use [R=301,L] on the first rule, to do a 301 redirect immediately. The client will then re-issue a new HTTP request for the same URL but with a trailing slash, bypassing the first rule, and then the second rule will internally (silently) pass the request to your script.

Other than that it looks OK, presuming that you already have


Options +FollowSymLinks
RewriteEngine on

somewhere above the code you posted. Without the second or both of those directives, mod_rewrite won't even run.

Note that the Options +FollowSymlinks directive may or may not be needed. It can cause a 500-Server Error if it is needed but missing, or if it is not needed and present, depending on the current AllowOverride and Options server configuration.

Jim

Etruscan

9:39 pm on May 19, 2006 (gmt 0)

10+ Year Member



Ok, I think it's solved.

My main DocumentRoot is not the root of my website, but of the folder where my sites reside. The mod_rewrite entry was in that particular <Directory> entry. When I moved it to the relevant <VirtualHost> entries, whose DocumentRoot points to the actual root of the site, it began functioning properly.

What it looks like now is:

#Make pretty URLs
RewriteEngine On
RewriteRule ^/article/([a-zA-Z0-9\-_]+)/$ /article/$1 [R]
RewriteRule ^/article/([a-zA-Z0-9\-_]+)?$ /article.html?id=$1 [L]

...the FollowSymLinks is not in the <VirtualHost> portion of httpd.conf but actually in the main "www" <Directory>. I'm not entirely confident this is the best way to configure the server, but it's working flawlessly.

jdMorgan

10:04 pm on May 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Slight speed-up and corrections:

RewriteEngine on
# Externally redirect to remove trailing slash
RewriteRule ^/article/([a-z0-9-_]+)/$ /article/$1 [NC,R=301,L]
# Internally rewrite search-friendly URLs to script page
RewriteRule ^/article/([a-z0-9-_]+)?$ /article.html?id=$1 [NC,L]

Using the [NC] flag (No Case) on both rules makes it unnecessary to do an extra, separate compare for uppercase and lowercase characters. It is not necessary to escape the hyphen character by preceding it with "\".

You must use an [L] flag on the first rule to avoid falling through to the second rule and exposing the URL of your script page. As it was, you risked creating a duplicate-content problem.

You had a 302-Moved Temporarily redirect on the first rule by default. If you want search engines to correct their links to remove the trailing slash, it must be a 301-Moved Permanently.

Jim