Forum Moderators: phranque
I have created the .htaccess file containing the following lines:
Options +FollowSymlinks
RewriteEngine on
RewriteBase /mod
RewriteRule ^(.*)/(.*)\.html$ test.php?directory=$1&file=$2 [PT]
RewriteRule ^(.*)\.html$ test.php?directory=root&file=$1 [PT]
RewriteRule ^(.*)$ test.php?error=true&file=$1
My visitors will follow links in the format:
(1) www.mydomain.com/charts/sample.html
or:
(2) www.mydomain.com/charts/ (index.html)
or they might try:
(3) www.mydomain.com/duff.html
I managed to capture my directory and file in example (1) but the other two screw it up.
I want to be able to capture (1) into two variables as I thought the first rewrite rule was doing. Achieve (2) with and without the forward slash being entered and capture anything else (3) being directed to another page that I can test if it is a page or send a 404 and show them the site index if not. Not really sure where to use the flags and think that my queries are ok?
(.*) matches "everything" it is therefore very inefficient. It needs to match "all characters up to the next slash" or "... next dot", instead of "everything".
Replace it with something else, like
([^/]+) or ([^.]+) -- except for ^(.*)$ which can be simplified to (.*) instead. Add
[L] to all of the rules, unless you know of a very very good reason to omit it. I am guessing that you need [L] in place of [PT] here. Be aware that unless you take steps to redirect non-www to www ahead of the rewrites that your content will be available at both versions of your URL.
Be aware that if you omit redirecting "URLs with parameters" to "URLs that look like folders" that both URL formats will serve your site as Duplicate Content.
Your third rule passes the trailing slash over to your script in the parameter.
[edited by: g1smd at 6:13 pm (utc) on Dec. 10, 2008]
# /charts/sample.html
RewriteRule ^([^/]+)/([^.]+)\.html$ test.php?directory=$1&file=$2 [L]
#
# /duff.html
RewriteRule ^([^./]+)\.html$ test.php?directory=root&file=$1 [L]
#
# /charts or /charts/
RewriteRule ^([^./]+)/?$ test.php?directory=$1 [L]
#
# anything else except "test.php" itself
RewriteCond $1 !^test\.php$
RewriteRule (.*) test.php?error=true&file=$1 [L]
A warning: The now-third rule creates duplicate content -- The same content appearing at more than one URL. Although this rule is what you asked for, it would be much better to externally 301-redirect URL-paths like /charts to /charts/ before internally rewriting only those URL-paths having the trailing slash. The 301 redirect tells the search engines that only the URL with the trailing slash is the correct, canonical URL.
Jim
1. I want to capture and rewrite all urls in this directory and subdirectories apart from the stuff where the database and web files are kept in a directory called siteStuff/
2. Anything else needs to be passed to my db in the format of directory and filename but I will need to handle pages from the root directory including index.html and just the root of the directory. Anything else where people put the wrong address in there can be sorted out by my PHP at the back-end and they can go to my home page or site index.
I think I need the following but am not completely sure:
To exclude the siteStuff directory:
RewriteCond $1 !^siteStuff/.$
Then the rewrite to capture the directory and path as per your previous suggestion (which is much better – its easy to understand when somebody has written it but doing it from scratch takes skill like yours!):
RewriteRule ^([^/]+)/([^.]+)\.html$ displayPage.php?directory=$1&file=$2.html [L]
To capture file requests to the root:
RewriteRule ^([^./]+)\..$ displayPage.php?directory=root&file=$1.html [L]
I don't know how to handle the root of the site / and I have neglected where somebody forgets the slash for a directory name as in chart where it should be chart/ - I guess the rule that gets index.html could be adapted to not check for the . and check for a / so that I don't looks them to a 404.
Can the 404 be transformed into displayPage.php?directory=root&file=error.html ?
I think that's enough to think about,
Barnaby
Make your 404 error page friendly and useful: Provide a polite explanation that the requested resource does not exist, and text links to your home page, HTML site map page, search facility (if any), and other appropriate pages such as category pages if appropriate. You can use a script for this -- by defining it using an "ErrorDocument 404" directive.
If a URL is incorrect in any way, including case or missing/additional slashes, externally redirect it to the correct URL or let it go 404, rather than "trying to make it work." Allowing more than one single URL to resolve to the same content creates duplicate-content problems and can cause search ranking problems.
For serious errors such a 500-Server Error, I strongly suggest *not* using a scripted error handler. Use a static HTML page so that errors caused by incorrect re-configuration or bad updates to your script interpreter cannot result in further (recursive) server errors -- This would result in a massive self-denial-of-service, and make it impossible to debug the problem.
Jim