Forum Moderators: phranque

Message Too Old, No Replies

Wildcard 301 filenames

301 redirect all files starting with the name "example-"

         

madmatt69

9:14 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey all,

Recently did a bunch of URL rewriting and it's great - except for all the 404's its generating. I can't go through and make a list of them all (around 500) and redirect them all.

So what I'm hoping is that since they have a certain naming convention (example-page-1234.html) I could redirect them to the main page of the site?

So something like :
redirect 301 /pagestobemoved/example-page* [mysite.com...]

Any ideas on how to get this to work? It'd help a tonne - I don't want to lose any link love those pages might have :)

jdMorgan

9:57 pm on Dec 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You were close:

RedirectMatch 301 ^/pagestobemoved/example-page(.*)$ http://www.example.com/

See Apache mod_alias [httpd.apache.org] for more info.

That's the server-technical answer, but as to Link-love, if you want to keep it and preserve good usability, then you should consider doing the following, in preference order:

  • Redirect each removed page URL (or many of your 500) to it's direct or reasonable replacement page.
  • If applicable, link groups of removed page URLs related by category to their category page, where a replacement may be found.
  • Link remaining pages to the site map, so a replacement might be found.
  • Linking all removed pages to the home page comes in dead last (search WebmasterWorld for "duplicate content").

    The best approach is to never remove a URL if possible. Instead re-use it by updating the page, or consider (and mark) the page as archival. See Cool URIs don't change [w3.org] by one of the two co-inventors of the WWW.

    Jim

  • madmatt69

    10:08 pm on Dec 23, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Thanks for the tips!

    It worked well.

    I've thoguht about 301'ing the urls to their new ones, but 500 rules in my htaccess would probably really slow things down, no? Is there a way to make it more efficient?

    I will do a bunch though. Thanks for the help!

    jdMorgan

    10:39 pm on Dec 23, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    > Is there a way to make it more efficient?

    There are many ways, depending on your previous URL architecture, and what kind of restructuring you've done.

    If groups of your URLs go to different subdirectories, put the .htaccess code for those URLs in the subdirectories.

    If your site architecture is 'flat' without using subdirectories, then if groups of your URLs start with a common string, you can still break up the rules into groups based on those strings. For example, here's a way to create an associative list which executes only for URLs starting with "apples":

    RewriteCond $1<>washington_apples ^apples-washington\.html<>(.+)$ [OR]
    RewriteCond $1<>red-beauty_apples ^apples-red-beauty\.php<>(.+)$ [OR]
    RewriteCond $1<>johnson_apples ^apples-johnson.htm<>(.+)$
    RewriteRule ^(apples.+)$ http://www.example.com/fruit/a/%1.php [R=301,L]

    Note that we're using a bit of a trick here to use one rule to rewrite many different URLs. The "<>" characters mean nothing: They are simply a unique string used to demarcate the end of one variable from the beginning of another.

    The advantage of this approach is that RewriteConds are not parsed unless the Rewrite pattern matches, so you end up with far fewer rules and faster execution, although the line-count is about the same. It's a somewhat advanced method, but not too hard to figure out with a bit of research. As you can see, it's changing various filetypes all to php, and various filenames from one format to another, getting rid of underscores in favor of hyphens, injecting two subdirectory levels "/fruit/a/" into the path, etc.

    The requested URI is passed to the RewriteConds as $1 from the parenthesized pattern in the RewriteRule. Then, whichever RewriteCond matches will pass the new URL-path back to the RewriteRule as %1 for use in the substitution URL. Only the unique part of the new URL is specified in the RewriteConds, with common path info inserted by the rule itself.

    This is intended to show what *can* be done, and not necessarily given as a solution to your problem. As stated, the right answer for you depends on your old and new URL layouts.

    Jim

    madmatt69

    12:28 am on Dec 24, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Whoah my brain is spinning :)

    Here's the old convention (its on a forum btw).

    post-words-here-vt1209_65.htm
    Would now be:
    post-words-here-vt1209-65.html

    So the only changes are the "_" is now a "-" and .htm is now .html.

    Some URLs don't have the "_" problem - they are just vt678.htm and so only need the .htm to turn into html.

    Is this best done in one rule or two? One for the htm > html and one for the _ > -

    madmatt69

    1:02 am on Dec 24, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I put the following rule in which seems to have worked for re-writing the htm to html:

    RewriteRule ^forums/(.*)\.htm$ /forums/$1.html [R=permanent, L]

    Now I'm going to try and get the underscores rewritten :)

    madmatt69

    1:09 am on Dec 24, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Got it :)

    RewriteRule ^forums/(.*)_(.*)\.html$ /forums/$1-$2.html [R=301,L]

    Works perfect.

    That being said - does it look good to you? Or is there a more efficient way of writing that?

    jdMorgan

    1:11 am on Dec 24, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    First do the URLs that have both problems, then (if necessary) take care of the URLs that have only one problem or the other. This avoids back-to-back 'stacked' redirects. For URLs with both underscores and .htm, try:

    RewriteRule ^forums/(.+vt[0-9]+)_([0-9]+)\.htm$ http://www.example.com/forums/$1-$2.html [R=301,L]

    [added] We cross-posted, so this was really in response to your previous post. [/added]

    Jim

    [edited by: jdMorgan at 1:13 am (utc) on Dec. 24, 2006]

    madmatt69

    1:39 am on Dec 24, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Nice! That works well.

    So should I have the other two rules after?

    example:
    RewriteRule ^forums/(.+vt[0-9]+)_([0-9]+)\.htm$ http://www.example.com/forums/$1-$2.html [R=301,L]
    RewriteRule ^forums/(.*)\.htm$ /forums/$1.html [R=301, L]

    Three different rules, but as I understand it they will only be called when necessary instead of having two rules that would create two 301's.
    Right?

    edit: the underscore rule I had previously wouldn't be necessary with your rule.

    jdMorgan

    1:56 am on Dec 24, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I strongly suggest you avoid the use of ".*" patterns whenever possible. They're easy to write, but innefficient to process. Negative patterns like these can be parsed from left-to-right in a single pass, without requiring multiple back-offs to achieve a match. (Since ".*" matches anything and everything, the parser will try to match your entire URL-path into the ".*" pattern, and then back off one character at a time, trying to find a match.)

    These modified 2nd and 3rd rules will be much faster to process:


    RewriteRule ^forums/([^_]+)_([^.]+)\.html$ /forums/$1-$2.html [R=301,L]
    RewriteRule ^forums/([^.]+)\.htm$ /forums/$1.html [R=301,L]

    The "([^_]+)_([^.]+)" part of the first rule's pattern means, "match one or more characters not equal to an underscore, followed by an underscore, followed by one or more characters not equal to a period" -- much more specific than ".*" which means, "match any number of any character."

    Speaking of specific, if all of the URLs you're rewriting have the vt<numbers>_<numbers> form, then consider using the even-more-specific patterns in the first (comprehensive) rule. However, if not all URLs have that form, then you should use the more-generic pattern from the 2nd rule in your first rule. Since this depends on your URL-set, which is unknown to me, all I can do is recommend a consistent approach among the three rules.

    > Three different rules, but as I understand it they will only be called when necessary instead of having two rules that would create two 301's. Right?

    Yes, that's the idea. You want to avoid feeding SE 'bots multiple redirects, both to avoid confusing them and to avoid getting 'site quality' demerits. There are more complicated ways to avoid stacked redirects, but for a small set of rules where a comprehensive rule is possible, simple is better.

    Jim

    madmatt69

    8:00 am on Dec 24, 2006 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Wow. That's awesome.

    Thanks so much for your help. I've learned way more doing this than I thought possible!

    I'll implement the new rules right away :)