Forum Moderators: phranque

Message Too Old, No Replies

Rewrite rules for urls using .htaccess

Wanting help with changing characters in urls and for converting cases in u

         

jemaverick

3:34 pm on Jan 20, 2010 (gmt 0)

10+ Year Member



I've been having quite a spot of bother trying to get these rewrite rules to work, and I've come full circle, minus the hair that I've pulled out along the way. I've followed quite a few step by step lessons on this problem, and always come up with a 500 error or a url that is rebellious towards any rules. Please help!

Basically, I want to transform my urls from having an underscore character between words to a hyphen. Annd I also want to transform the text in the urls from uppercase to lowercase.

So I would like to go from this:

http://www.example.com/Main_Page

to this:

http://www.example.com/main-page

Not too hard, right? Well, it is for me. I'm running Mediawiki 1.15 in the root folder of the server. I've already managed to shorten the urls, but just can't get any further. Here is the code that I have in the .htaccess file already:

Options +FollowSymlinks
RewriteEngine On

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Don't rewrite requests for files in MediaWiki subdirectories,
# MediaWiki PHP files, HTTP error documents, favicon.ico, or robots.txt
RewriteCond %{REQUEST_URI} !^/(stylesheets¦images¦skins¦extensions)/
RewriteCond %{REQUEST_URI} !^/(redirect¦texvc¦index).php
RewriteCond %{REQUEST_URI} !^/error/(40(1¦3¦4)¦500).html
RewriteCond %{REQUEST_URI} !^/favicon.ico
RewriteCond %{REQUEST_URI} !^/robots.txt

# Rewrite http://www.example.com/article properly, this is the main rule
RewriteRule ^(.*)$ /index.php/?title=$1 [L,QSA]

(BTW...the favicon.ico isn't showing, besides being in the right place!) Can somebody please help? I'd greatly appreciate it.

jdMorgan

11:49 pm on Jan 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unfortunately, case-conversion and character-substitution in a .htaccess context is *terribly* inefficient. While it may be useful if invoked infrequently to correct obsolete-URL requests or requests due to a *few* bad links from other sites, you cannot rely on it for the normal operation of your site. When changing multiple underscores to hyphens (or making similar substitutions) in a large set of requested URLs, it is not uncommon to see page-load times increase by *seconds*, especially if the coding is poor.

Further, there is a documented bug in all versions of mod_rewrite which causes path errors in .htaccess code, requiring an inefficient and non-intuitive approach. Example code including this function is here [webmasterworld.com] (read very carefully, as at least three "sections" of the posted code are required in order to function).

I'd suggest that a better solution would be to correct the URLs where they are defined: In the links on your pages.

I will also suggest that you re-arrange your RewriteConds in the rule you posted above, so that the file- and directory-exists checks are done last. Further, add a new RewriteCond at the top of the list, excluding "index\.php" as the first step (instead of as the third part of RewriteCond #3 as it is now). Excluding index.php at the start and moving the exists-checks to last will more than double the performance of this rule by eliminating two unnecessary disk checks in all cases, and eliminating *all* unnecessary disk checks for paths already excluded from this rule.

Jim

jemaverick

8:02 am on Jan 21, 2010 (gmt 0)

10+ Year Member



Thanks Jim.

Regarding the rearrangement of the RewriteConds, you believe that I should change this:

RewriteCond %{REQUEST_URI} !^/(redirect¦texvc¦index).php

to this:

RewriteCond %{REQUEST_URI} !^/(redirect¦texvc).php

and create a new RewriteCond to be placed at the top of the RewriteConds list that looks a little something like this:

RewriteCond %{REQUEST_URI} !^/index.php ?

Is that the correct way to code it?

jdMorgan

8:50 pm on Jan 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just test for index.php first, since due to the fact that mod_rewrite processing is re-started after any RewriteRules are invoked, index.php requests will account for a full 50% of all requests handled by this rule.

e.g. as the first rewritecond:


RewriteCond %{REQUEST_URI} !^/(index¦redirect¦texvc)\.php

Be sure to change all broken pipe "¦" characters you see on this forum to solid pipe characters before use; Posting on this forum modifies the pipe characters.

Also, escape all literal periods in patterns as shown. Otherwise they are taken for the regex token meaning "match any single character."

Jim

nemos

11:41 pm on Jan 21, 2010 (gmt 0)

10+ Year Member



jdMorgan,

I read you said there is a documented bug on mod_rewrite when running on .htaccess. Where can information about this be found?

I have sometimes found odd behaviour on rules running on .htaccess, which I never know if there is some bug on mod_rewrite, or I don't quite understand how they work.

Is mod_rewrite on .htaccess reliable?

Regards

jdMorgan

12:22 am on Jan 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nasty mod_rewrite bug [archive.apache.org]

Other than this bug, mod_rewrite is very reliable.

Most problems are not caused by this bug. Most are caused by poor coding -- either poor logic in the rules, or sloppy or incorrect regular-expressions patterns. Or both. Mostly both.

However, in the case where a request is rewritten multiple times, Apache can erroneously re-inject part of the path into the rewritten path.

For example, these rules won't work properly, if I recall the problem correctly:


RewriteRule ^a(.*)$ b$1
RewriteRule ^b(.*)$ c$1
RewriteRule ^c(.*)$ d$1 [L]

If you request the URL-path "/apple.html", you'd expect to get "dpple.html" as the output. But what you actually get is something like "dpple.htmlbpplecpple"

This is probably not exactly what happens -- It has been several years since I last tested this problem. And ever since then, I have simply avoided using any construct which might trigger the error.

Although the bug documentation claims that this bug was to be fixed on Apache 2.x, some testing I did more than a year ago indicated that the problem still exists.

This is a good reason to always end each RewriteRule with an [L] flag, and to never allow a request to be rewritten by more than one rule -- Make all rules specific and mutually-exclusive so that any given requested URL-path is rewritten or redirected to the final destination by a single rule.

Jim