Forum Moderators: phranque
.. not so much the names, but changing underscores "_" to hyphens "-".
I have about 400 file extensions to change and about 250 underscore/hyphen issues. So this got me to wondering about the rewrite.
Some time back, Jim gave me this for another site. It should take care of the extension changes:
# rewrite.htm and.html files to.php
rewriteRule ^([^.]+)\.html?$ $1.php [L]
But then I started wondering if:
1 - does the file name rewrite need to be:
blue_widgets.htm blue-widgets.php
or is the server looking for .php because of the previous rewrite rule .. so should it be
blue_widgets.php blue-widgets.php
Then I found this on the web, which is touted to change underscores to hyphens, but I don't understand it.
RewriteRule ^/?([^_]+)_(.*)$ $1-$2 [N,L]
If something like this will work it would save lots of typing .. and make a much smaller .htaccess file
RewriteRule ^/?([^_]+)_(.*)$ $1-$2 [N,L]
The restart is intended to allow the rule to execute multiple times if needed, because there might be multiple underscores in the requested URL-path, and the rule only replaces one at a time.
If you know the maximum number of possible underscores, you could make the process more efficient (restarts are inefficient, especially if this rule is not near the top of the file). To do that, you could 'stack' several rules:
# Fix four underscores
RewriteRule ^([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.*)$ http://www.example.com/$1-$2-$3-$4-$5 [R=301,L]
# Fix three underscores
RewriteRule ^([^_]+)_([^_]+)_([^_]+)_(.*)$ http://www.example.com/$1-$2-$3-$4 [R=301,L]
# Fix two underscores
RewriteRule ^([^_]+)_([^_]+)_(.*)$ http://www.example.com/$1-$2-$3 [R=301,L]
# Fix one underscore
RewriteRule ^([^_]+)_(.*)$ http://www.example.com/$1-$2 [R=301,L]
Further, understand that in either case, the rewriting/redirecting is being done without regard to whether the 'corrected' URL will resolve to an existing file. As such, it's open to someone toying with your site by linking to non-existent underscored URL-paths, making both search engines and your server handle pointless redirects. That could be fixed by doing a file-exists check on the soon-to-be-rewritten path. But now, we must look for a specific filetype, since this is to be combined with a subsequent html-URL to php-filepath rewrite:
# Fix four underscores if hyphenated php file exists
RewriteCond %{DOCUMENT_ROOT}/$1-$2-$3-$4-$5.php -f
RewriteRule ^([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^.]+\.html?)$ http://www.example.com/$1-$2-$3-$4-$5 [R=301,L]
# Fix three underscores
RewriteCond %{DOCUMENT_ROOT}/$1-$2-$3-$4.php -f
RewriteRule ^([^_]+)_([^_]+)_([^_]+)_([^.]+\.html?)$ http://www.example.com/$1-$2-$3-$4 [R=301,L]
# Fix two underscores
RewriteCond %{DOCUMENT_ROOT}/$1-$2-$3-$4.php -f
RewriteRule ^([^_]+)_([^_]+)_([^.]+\.html?)$ http://www.example.com/$1-$2-$3 [R=301,L]
# Fix one underscore
RewriteCond %{DOCUMENT_ROOT}/$1-$2-$3-$4.php -f
RewriteRule ^([^_]+)_([^.]+\.html?)$ http://www.example.com/$1-$2 [R=301,L]
#
# Having fixed the hyphens, now internally rewrite .html and .html file requests to .php
RewriteRule ^([^.]+)\.html?$ /$1.php [L]
Jim
For some reason, either this isn't working for me or I put an error in my .htaccess file. I surveyed my site and found that the maximum number of underscores in any filename is 1.
Here is what I have in the .htaccess right now
# Fix one underscore
RewriteCond %{DOCUMENT_ROOT}/$1-$2.php -f
RewriteRule ^([^_]+)_([^.]+\.html?)$ [mysite.com...] [R=301,L]
# Having fixed the hyphens, now internally rewrite .html and .html file requests to .php
RewriteRule ^([^.]+)\.html?$ /$1.php [L]
Help will be sincerely appreciated.
Edit: The rewrite from .htm to .php works
# If php version of hyphenated htm or html file exists, replace underscore with hyphen
RewriteCond %{DOCUMENT_ROOT}/$1-$2.php -f
RewriteRule ^([^_]+)_([^.]+)\.(html?)$ http://www.example.com/$1-$2.$3 [R=301,L]
[edited by: jdMorgan at 5:06 am (utc) on Mar. 20, 2008]