Forum Moderators: phranque
I have many pages and for 301 redirecting them I do not wish to make a complete list of old URLs and corresponding new URLs. Is there a method of 301 redirect wherein a single line using wildcards can be written which while redirecting to the new domain, even converts all underscores to dashes.
e.g
olddomain.com/a_b_c.html -> newdomain.com/a-b-c.html
olddomain.com/d_e_f.html -> newdomain.com/d-e-f.html
and so on
Thanks in advance
This is a very interesting problem that pushes the limits of non-scripted solutions. It's somewhat nasty and inefficient, but it can be solved in several ways using mod_rewrite:
# Two or more underscores -- replace one and restart loop internally
RewriteCond %{REQUEST_URI} _.*_
RewriteRule ^([^_]*)_([^_]*)$ /$1-$2 [N]
# One underscore in URL -- replace it and do external redirect
RewriteCond %{REQUEST_URI} _
RewriteRule ^([^_]*)_([^_]*)$ http://www.example.com/$1-$2 [R=301,L]
An alternative approach, if you have only a few underscores per URL (up to five shown here) might be:
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)$ http://www.example.com/$1-$2-$3-$4-$5-$6 [R=301,L]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)$ http://www.example.com/$1-$2-$3-$4-$5 [R=301,L]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)$ http://www.example.com/$1-$2-$3-$4 [R=301,L]
RewriteRule ^([^_]*)_([^_]*)_([^_]*)$ http://www.example.com/$1-$2-$3 [R=301,L]
RewriteRule ^([^_]*)_([^_]*)$ http://www.example.com/$1-$2 [R=301,L]
The second method is better if you cannot place the code near the top of your file. Both methods are designed to avoid multiple external (and therefore slow) redirects. These methods may be given elsewhere as examples, but I just typed this code; It should work, but it is not tested at all. Test in a subdirectory or on a development server before you deploy this code on a live server!
If you have access to httpd.conf (main server configuration file), you can also use RewriteMap to do this. It might be more efficient.
You might also consider calling a simple cgi script to do the character substitution and redirection, if you are more comfortable with that approach. Use only a 301-Moved Permanently redirect to avoid losing your search engine rankings!
No matter which approach you use, I strongly suggest writing individual redirects for the pages, scripts, and images that currently consume the top 33% of your bandwidth. The above code should work, but as stated, it's not terribly efficient. Bypassing it for your busiest pages will likely improve your site's performance.
Notes for all readers: It is not our normal practice to allow "write my code for me" posts here. However, this case is sufficiently interesting and complex that I decided to relax that rule temporarily. Despite that, this should not be taken as a precedent for change; The policy as stated in our Charter still stands. The link below may come in very handy if the code above is not immediately clear. Be sure to follow the links in the first post of the thread, as well as reading the post itself:
Ref links: Introduction to mod_rewrite [webmasterworld.com]
Jim
Method #1 should work regardless of the number of replacements needed, so I'll assume you are trying to use method #2.
Some subtle modifications are needed -- Note that all the RewriteRules except the last one now use an internal path substitution, not a redirect. They also leave the final underscore of the current URI in place -- this is required in all cases to 'trigger' the last RewriteRule to do the external redirect to tell the browser or spider that the URL has changed.
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5-$6_$7
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4-$5_$6
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3-$4_$5
RewriteRule ^([^_]*)_([^_]*)_([^_]*)_(.*)$ $1-$2-$3_$4
RewriteRule ^([^_]*)_([^_]*)_(.*)$ $1-$2_$3
RewriteRule ^([^_]*)_([^_]*)$ http://www.example.com/$1-$2 [R=301,L]
If you do notice performance problems, you might consider adding a 'skip' clause to the code to exclude file-types that do not contain underscores (if there are any), or put the code only in a subdirectory that requires it... anything to avoid processing this code for every page, image, script, and CSS file request to your server.
For example to skip processing for .gif or .jpg files, you would add:
RewriteRule \.(gif¦jpg)$ - [S=6]
Jim