Forum Moderators: phranque
Recently did a bunch of URL rewriting and it's great - except for all the 404's its generating. I can't go through and make a list of them all (around 500) and redirect them all.
So what I'm hoping is that since they have a certain naming convention (example-page-1234.html) I could redirect them to the main page of the site?
So something like :
redirect 301 /pagestobemoved/example-page* [mysite.com...]
Any ideas on how to get this to work? It'd help a tonne - I don't want to lose any link love those pages might have :)
RedirectMatch 301 ^/pagestobemoved/example-page(.*)$ http://www.example.com/
That's the server-technical answer, but as to Link-love, if you want to keep it and preserve good usability, then you should consider doing the following, in preference order:
The best approach is to never remove a URL if possible. Instead re-use it by updating the page, or consider (and mark) the page as archival. See Cool URIs don't change [w3.org] by one of the two co-inventors of the WWW.
Jim
There are many ways, depending on your previous URL architecture, and what kind of restructuring you've done.
If groups of your URLs go to different subdirectories, put the .htaccess code for those URLs in the subdirectories.
If your site architecture is 'flat' without using subdirectories, then if groups of your URLs start with a common string, you can still break up the rules into groups based on those strings. For example, here's a way to create an associative list which executes only for URLs starting with "apples":
RewriteCond $1<>washington_apples ^apples-washington\.html<>(.+)$ [OR]
RewriteCond $1<>red-beauty_apples ^apples-red-beauty\.php<>(.+)$ [OR]
RewriteCond $1<>johnson_apples ^apples-johnson.htm<>(.+)$
RewriteRule ^(apples.+)$ http://www.example.com/fruit/a/%1.php [R=301,L]
Note that we're using a bit of a trick here to use one rule to rewrite many different URLs. The "<>" characters mean nothing: They are simply a unique string used to demarcate the end of one variable from the beginning of another.
The advantage of this approach is that RewriteConds are not parsed unless the Rewrite pattern matches, so you end up with far fewer rules and faster execution, although the line-count is about the same. It's a somewhat advanced method, but not too hard to figure out with a bit of research. As you can see, it's changing various filetypes all to php, and various filenames from one format to another, getting rid of underscores in favor of hyphens, injecting two subdirectory levels "/fruit/a/" into the path, etc.
The requested URI is passed to the RewriteConds as $1 from the parenthesized pattern in the RewriteRule. Then, whichever RewriteCond matches will pass the new URL-path back to the RewriteRule as %1 for use in the substitution URL. Only the unique part of the new URL is specified in the RewriteConds, with common path info inserted by the rule itself.
This is intended to show what *can* be done, and not necessarily given as a solution to your problem. As stated, the right answer for you depends on your old and new URL layouts.
Jim
Here's the old convention (its on a forum btw).
post-words-here-vt1209_65.htm
Would now be:
post-words-here-vt1209-65.html
So the only changes are the "_" is now a "-" and .htm is now .html.
Some URLs don't have the "_" problem - they are just vt678.htm and so only need the .htm to turn into html.
Is this best done in one rule or two? One for the htm > html and one for the _ > -
RewriteRule ^forums/(.+vt[0-9]+)_([0-9]+)\.htm$ http://www.example.com/forums/$1-$2.html [R=301,L]
Jim
[edited by: jdMorgan at 1:13 am (utc) on Dec. 24, 2006]
So should I have the other two rules after?
example:
RewriteRule ^forums/(.+vt[0-9]+)_([0-9]+)\.htm$ http://www.example.com/forums/$1-$2.html [R=301,L]
RewriteRule ^forums/(.*)\.htm$ /forums/$1.html [R=301, L]
Three different rules, but as I understand it they will only be called when necessary instead of having two rules that would create two 301's.
Right?
edit: the underscore rule I had previously wouldn't be necessary with your rule.
These modified 2nd and 3rd rules will be much faster to process:
RewriteRule ^forums/([^_]+)_([^.]+)\.html$ /forums/$1-$2.html [R=301,L]
RewriteRule ^forums/([^.]+)\.htm$ /forums/$1.html [R=301,L]
Speaking of specific, if all of the URLs you're rewriting have the vt<numbers>_<numbers> form, then consider using the even-more-specific patterns in the first (comprehensive) rule. However, if not all URLs have that form, then you should use the more-generic pattern from the 2nd rule in your first rule. Since this depends on your URL-set, which is unknown to me, all I can do is recommend a consistent approach among the three rules.
> Three different rules, but as I understand it they will only be called when necessary instead of having two rules that would create two 301's. Right?
Yes, that's the idea. You want to avoid feeding SE 'bots multiple redirects, both to avoid confusing them and to avoid getting 'site quality' demerits. There are more complicated ways to avoid stacked redirects, but for a small set of rules where a comprehensive rule is possible, simple is better.
Jim