Forum Moderators: phranque
You need a 301 redirect which strips index.php or index.html from any request that is including the index filename.
You need a catch-all to 301 redirect all non-www requests to the www version.
Finally, you can do your rewrite.
If you don't do the other steps then you leave your website open to Duplicate Content indexing.
The links on your website should also be updated to be in the now format.
Since you are rewriting, why stick with parameter-based formats?
If I were doing this, I would rewrite and use a URL like www.example.com/category/news/2 or www.example.com/news/2 instead.
Is "category" the actual fixed word "category" or does that word change to many different keywords?
The code is fairly simple, once we know exactly what you want to do, and is very similar to a question asked yesterday.
[edited by: g1smd at 7:50 am (utc) on Sep. 29, 2008]
It also includes many of the other fixes that I hinted about above. You need both a redirect and a rewrite to do this properly.
You will need to study it, understand it, and understand why every part is there, and then adapt it using other examples you can find in this forum to do exactly what you want it to do.
Take a simple case with two parameters, with a three and a seven digit value, and some common extra fixes that are either necessary or highly desirable:
External URL Format: www.example.com/345/1234567
Internal Server Path: /index.php?cat=345&art=1234567
# Specify acceptable index/root file. You could have a static
# index.html or allow index.php without parameters for root:
DirectoryIndex index.html index.php # Redirect to remove trailing period or comma from URL request
# with parameters, such as from forum with autolink, and force
# www to always be in the URL:
RewriteCond %{QUERY_STRING} ^(([^&]+&)*)[.,]$
RewriteRule (.*) http://www.example.com/$1?%1 [R=301,L] # Redirect to remove trailing period or comma from URL request
# with path, such as from forum with autolink, and force www:
RewriteRule ^(([^/]+/)*)[.,]$ http://www.example.com/$1 [R=301,L] # Replace comma(s) or multiple filetype delimiter periods in page filepaths
# with a single period (e.g. "/page,html" or "/page..html")
RewriteRule ^([^,.]+)([,.]{2,}¦,)((s?html?¦php[1-9]?¦[aj]spx?¦pdf¦xls¦jpe?g¦gif).*)$ /%1.%3 [R=301,L] # Redirect two-parameter-based index.php¦html? or / URL request
# (with parameters in any order) to folder-based URL format, and
# force www to always be in URL:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(index\.(php¦html?))?(\?[^\ ]*)\ HTTP/ [NC]
RewriteCond %{QUERY_STRING} &?cat=([0-9]{3})&?
RewriteCond %1>%{QUERY_STRING} ^([^>]+)>([^&]*&)*art=([0-9]{7})&?
RewriteRule ^(index\.(php¦html?))?$ http://www.example.com/%1/%3? [R=301,L] # Force all remaining requests for named index files to drop
# the index file filename, and force www:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(html?¦php)(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.(html?¦php)$ http://www.example.com/$1 [R=301,L] # General rule to force all non-www URLs to be www URLs.
# This rule must be the last one of the redirects:
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L] # Rewrite such that stray parameters or value names on any index.html?
# or any index.php URL or on / URL request always fail to the
# 404 page:
RewriteCond %{QUERY_STRING} .
RewriteRule ^(index\.(php¦html?))?$ /this.page.does.not.exist [L] # Rewrite URL request: www.example.com/345/1234567 to internal
# path: /index.php?cat=345&art=1234567 to serve content:
RewriteRule ^([0-9]{3})/([0-9]{7})$ /index.php?cat=$1&art=$2 [L] Website now only directly responds with content as "200 OK" for paths like / and /345/1234567 and all other formats either redirect or fail to the 404 Error Page if a real file does not exist. It does this without having to do server-intensive !-d and !-f checks on the filesystem. You can still have physical files like contact.html and they will still work. You can also have other rewritten URL formats, just as long as they cannot be confused with the URL format already in use above.
The site root can be implemented either as a static index.html page or by using index.php without any parameters. The script will also need to check that, when present, parameter values are acceptable, and fail to the internally script-generated 404 Error Page if not.
The rules in the .htaccess file restrict certain URL requests from ever reaching the filesystem on the server. They are redirected or failed to a 404 immediately.