Forum Moderators: phranque

Message Too Old, No Replies

Rewrite Extension + Filename

         

old_expat

3:41 am on Mar 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm changing a number of pages from .htm and .html to .php; and figured at the same time to alter the file names ..

.. not so much the names, but changing underscores "_" to hyphens "-".

I have about 400 file extensions to change and about 250 underscore/hyphen issues. So this got me to wondering about the rewrite.

Some time back, Jim gave me this for another site. It should take care of the extension changes:

# rewrite.htm and.html files to.php
rewriteRule ^([^.]+)\.html?$ $1.php [L]

But then I started wondering if:

1 - does the file name rewrite need to be:

blue_widgets.htm blue-widgets.php

or is the server looking for .php because of the previous rewrite rule .. so should it be

blue_widgets.php blue-widgets.php

Then I found this on the web, which is touted to change underscores to hyphens, but I don't understand it.

RewriteRule ^/?([^_]+)_(.*)$ $1-$2 [N,L]

If something like this will work it would save lots of typing .. and make a much smaller .htaccess file

jdMorgan

3:23 pm on Mar 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are many ways to accomplish your goal, but only you can pick the one most appropriate for your site.

RewriteRule ^/?([^_]+)_(.*)$ $1-$2 [N,L]

This says, "Match any requested URL-path that begins with an optional leading slash, followed by one or more characters not equal to an underscore (save this part as $1), followed by an underscore, followed by any number of any charaters (save this part as $2), and rewrite that to <saved-part-$1> hyphen <saved-part-$2>, stop mod_rewrite processing for this pass, and restart the mod_rewrite processing from the top of the .htaccess file.

The restart is intended to allow the rule to execute multiple times if needed, because there might be multiple underscores in the requested URL-path, and the rule only replaces one at a time.

If you know the maximum number of possible underscores, you could make the process more efficient (restarts are inefficient, especially if this rule is not near the top of the file). To do that, you could 'stack' several rules:


# Fix four underscores
RewriteRule ^([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.*)$ http://www.example.com/$1-$2-$3-$4-$5 [R=301,L]
# Fix three underscores
RewriteRule ^([^_]+)_([^_]+)_([^_]+)_(.*)$ http://www.example.com/$1-$2-$3-$4 [R=301,L]
# Fix two underscores
RewriteRule ^([^_]+)_([^_]+)_(.*)$ http://www.example.com/$1-$2-$3 [R=301,L]
# Fix one underscore
RewriteRule ^([^_]+)_(.*)$ http://www.example.com/$1-$2 [R=301,L]

Note that unlike the previous example, this code does an external redirect to 'correct' the URL shown in search engine results. I also removed the leading slash part, since it's not needed in .htaccess (I'd recommend hard-coding the slash in or out as needed for httpd.conf or .htaccess uses, respectively).

Further, understand that in either case, the rewriting/redirecting is being done without regard to whether the 'corrected' URL will resolve to an existing file. As such, it's open to someone toying with your site by linking to non-existent underscored URL-paths, making both search engines and your server handle pointless redirects. That could be fixed by doing a file-exists check on the soon-to-be-rewritten path. But now, we must look for a specific filetype, since this is to be combined with a subsequent html-URL to php-filepath rewrite:


# Fix four underscores if hyphenated php file exists
RewriteCond %{DOCUMENT_ROOT}/$1-$2-$3-$4-$5.php -f
RewriteRule ^([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^.]+\.html?)$ http://www.example.com/$1-$2-$3-$4-$5 [R=301,L]
# Fix three underscores
RewriteCond %{DOCUMENT_ROOT}/$1-$2-$3-$4.php -f
RewriteRule ^([^_]+)_([^_]+)_([^_]+)_([^.]+\.html?)$ http://www.example.com/$1-$2-$3-$4 [R=301,L]
# Fix two underscores
RewriteCond %{DOCUMENT_ROOT}/$1-$2-$3-$4.php -f
RewriteRule ^([^_]+)_([^_]+)_([^.]+\.html?)$ http://www.example.com/$1-$2-$3 [R=301,L]
# Fix one underscore
RewriteCond %{DOCUMENT_ROOT}/$1-$2-$3-$4.php -f
RewriteRule ^([^_]+)_([^.]+\.html?)$ http://www.example.com/$1-$2 [R=301,L]
#
# Having fixed the hyphens, now internally rewrite .html and .html file requests to .php
RewriteRule ^([^.]+)\.html?$ /$1.php [L]

So as you can see, the rulesets are inter-dependent. No-one ever claimed this stuff was simple... :)

Jim

old_expat

4:23 pm on Mar 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, Jim.

I'll check to see maximum number of underscores .. I think no more than 2, then I'll test.

This is going to save me a lot of typing. :)

old_expat

2:02 am on Mar 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Jim,

For some reason, either this isn't working for me or I put an error in my .htaccess file. I surveyed my site and found that the maximum number of underscores in any filename is 1.

Here is what I have in the .htaccess right now

# Fix one underscore
RewriteCond %{DOCUMENT_ROOT}/$1-$2.php -f
RewriteRule ^([^_]+)_([^.]+\.html?)$ [mysite.com...] [R=301,L]

# Having fixed the hyphens, now internally rewrite .html and .html file requests to .php
RewriteRule ^([^.]+)\.html?$ /$1.php [L]

Help will be sincerely appreciated.

Edit: The rewrite from .htm to .php works

jdMorgan

3:00 am on Mar 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please define "not working":
  • How did you test?
  • What were the results?
  • How did those results differ from your expectations/desires/requirements?

    Thanks,
    Jim

  • old_expat

    4:11 am on Mar 20, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Hi Jim,

    Sorry, should have been more detailed

    Test: typed in "http://www.mysite.com/this_file.htm" in browser address bar
    Results: 404 not found
    Expectations: hoped to see "http://www.mysite.com/this-file.php"

    jdMorgan

    5:05 am on Mar 20, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Yeah, I kind of screwed you up; The parentheses were not quite right.

    # If php version of hyphenated htm or html file exists, replace underscore with hyphen
    RewriteCond %{DOCUMENT_ROOT}/$1-$2.php -f
    RewriteRule ^([^_]+)_([^.]+)\.(html?)$ http://www.example.com/$1-$2.$3 [R=301,L]

    Jim

    [edited by: jdMorgan at 5:06 am (utc) on Mar. 20, 2008]

    old_expat

    5:13 am on Mar 20, 2008 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Oh yeah! Works beautiful. Thanks a ton, Jim. :))