Welcome to WebmasterWorld Guest from 3.92.92.168

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Clean URLs

Mod_rewrite VS other methods

     
12:46 am on Jan 6, 2004 (gmt 0)

New User

10+ Year Member

joined:Dec 30, 2003
posts:5
votes: 0


I have a site that uses mod_rewrite to clean up the query sting and file suffix on 60% of the pages. The other 40% are either the default or we simply didn't care about them enough to write a rule. Currently we use mod rewrite to convert http://www.domain.com/products/ProductID to http://www.domain.com/products/detail.php?ProductID=$1

Now the marketing department would like to have URLs for the product pages to be http://www.domain.com/ProductName but figuring out the best way to do it has been a challenge. If you have any ideas on the best way to do this or fixes for my efforts I'd love some input. Here's what I've thought of/tried to date:

1 - I wrote a custom 404 page that would search for what ever was in the uri and redirect you to the appropriate page. It worked great in NN, Opera and Moz, but IE didn't like my custom 404 page forcing me to change
ErrorDocument 404 /error404.php to
ErrorDocument 404 http://www.domain.com/error404.php
and losing whatever the URI had in it. I've read that this is a fairly common problem, but there has to be a fix... doesn't there?

2 - I wrote a RewriteMap for the products (RewriteMap products txt:/data/www/root/products.txt RewriteRule ^/(.*)$ /products/detail.php?ProductID=${products:$1,0} [L]) which works great for the products, but it looks like I'd have to write/manage rules of all of the pages on the site as this rule is applied as the default. Is there a RewriteMap trick that I've missed?

3 - I thought I could write a unique modrewrite rule for every product and stick them in the .htaccess file. Actually, I'd write a cron job that would rewrite the .htaccess file as the product offering is ever changing. But, this seems like a bad idea on several fronts.

thanks
summer

[edited by: jdMorgan at 2:12 am (utc) on Jan. 8, 2004]
[edit reason] de-linked [/edit]

4:40 am on Jan 6, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


summer,

I can't answer your question about RewriteMaps, but I thought I had better post and warn you about one issue:

Don't *ever* do this:

 ErrorDocument 4xx http://www.example.com 

on any server where search engine ranking matters. As noted in the Apache documentation, using a full URL in ErrorDocument changes the server response from the appropriate error code to a 302-Moved Temporarily (external!) redirect.

Search engine spiders occasionally test for your server's missing-file response (much to the consternation of some webmasters who see requests for non-existent files coming from reputable search engines and are unaware of this behavior). A 302 response will confuse and befuddle search engine spiders, since anything they ask for will get either an existing page or a 302 redirect. After some time, they will conclude that your site is infinite in size, and give up trying to spider it early.

Also, even if it works, this causes all 404s to become 302s, and requires the user's browser to issue another request to fetch the error page. This can cause lost visitors if a network slow-down delays the delivery of the error page -- in the meantime, they just see their browser 'spinning' and nothing on the screen.

Basically, I cringe whenever someone posts about using ErrorDocument 4xx for anything else other than simply delivering an error document; There are simply too many pitfalls.

A far better approach is to use the -f, -F, and -U flags of RewriteCond to test whether a file or URL exists. The -f is to be preferred, because both -F and -U invoke an additional internal subrequest, and this causes a server performance hit.

So you'd have something like:


RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* /error404.php [L]

or
 RewriteCond %{REQUEST_URI} !-U
RewriteRule .* /error404.php [L]

In addition, the [PT] flag may be required if your script is expecting request_rec to contain the requested filepath.

Caveat redirector,
Jim

<added> The issue with IE not working probably has to do with the MIME-type being returned by the error script - Make sure it is appropriate to the content returned, not to the script filetype itself (i.e. text/html, not application/x-php or whatever). </added>

10:29 pm on Jan 6, 2004 (gmt 0)

New User

10+ Year Member

joined:Dec 30, 2003
posts:5
votes: 0


Wouldn't using modrewrite to control the 404 page create the same problems with search engines? Instead of getting a 404 or 302 Response Code the server would send back 200 and the site would still appear to be infinitely large.

Also, I checked the MIME-type being returned by my error script and it's text/html. I hear that IE will not display anything you send it if accompanied by a Response Code of 404. But I checked your site and mine in IE and Lynx. In IE your custom error page (AKA the log in page works great) mine not at all. In Lynx they both work. Here's the 404 page header info on both sites:

[summer@foo04 summer]$ lynx -mime_header http://www.webmasterworld.com/klfdksd more
HTTP/1.1 404 Not Found
Date: Tue, 06 Jan 2004 22:15:39 GMT
Server: Apache/1.3.26 (Unix) FrontPage/5.0.2.2510
Cache-Control: max-age=0
Pragma: no-cache
X-Powered-By: BestBBS v3.045
Connection: close
Content-Type: text/html

[summer@foo04 summer]$ lynx -mime_header http://204.***.71.16/dlkfjsldjf more
HTTP/1.1 404 Not Found
Date: Tue, 06 Jan 2004 22:40:25 GMT
Server: Apache/2.0.40 (Red Hat Linux)
Accept-Ranges: bytes
X-Powered-By: PHP/4.2.2
Connection: close
Content-Type: text/html; charset=ISO-8859-1

Arrrgggg!

You did provide a hint on the the www.domain.com/ProductName problem, I should be able to say, something like

RewriteMap products txt:/data/www/root/htaccess.txt

RewriteCond!/{REQUEST_URI} -f [NC]
RewriteRule /(.*)$ /products/detail.php?ProductID=${products:$1} [L]

That's If you can't find a file by the name of the URI then look it up in the rewrite map.... but it's not working either! Argggg.

summer

[edited by: jdMorgan at 2:15 am (utc) on Jan. 8, 2004]
[edit reason] de-linked [/edit]

2:10 am on Jan 8, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Syntax problem:

RewriteCond %{REQUEST_URI} !-f [NC]