Forum Moderators: phranque

Message Too Old, No Replies

Wildcard 301 all .html pages

+ is this even advisable?

         

Heavenguard

8:49 pm on Dec 12, 2008 (gmt 0)

10+ Year Member



Short version:

I want to redirect SOME .html pages to their new, corresponding .php pages of different file names.

For the remainder of the site, I want to wildcard ALL the .html pages to index.php.

I don't know/can't figure out enough of the syntax to get [domain.com...] to go to a specified page.

My apologies for my ignorance, I've never had to deal with an Apache server before.

Long version (+advisable question):
I picked up a job that's a real mess.

There are TWO live versions of the same site on the same server - one old, one new. The new one went live 2 years ago, and is a Drupal (php) site. The old site was never taken down, because of concerns over Google page rankings. I, however, think it's an abomination that there are two live versions of the same site, with the old one having nothing but outdated content.

I want to redirect the most important sections/pages from the old site to the corresponding section/page on the new site, and all the rest of the old site just to the front page of the new one.

Is this tactic advisable? Do multiple 301s to the same page hurt ranking? Is it more less evil to keep an old site up than to lose rankings?

I understand the desire to keeping ranking, but I think a line needs to be drawn somewhere, and to me, two live - one old - sites is clearly beyond it.

jdMorgan

10:35 pm on Dec 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Best practice: Do not redirect large numbers of URLs to a single URL. It is a usability nightmare and a search ranking liability. Instead:

  • 301 redirect old .html pages to their exact functional replacement, or at least something very close. Don't confuse your visitor.
  • For .html pages with no direct replacement, return a 410-Gone status, indicating that the pages have been intentionally removed.
  • Declare a custom 410 error page (See Apache ErrorDocument)
  • On the custom 410 error page, explain politely that the old page has been intentionally removed. Offer text links to your HTML site map page, your site search facility, your home page, and category and/or product pages, as applicable. Help the visitor find what he/she wanted.
  • Optionally, include a meta-refresh on the page that is invoked after more than enough time has passed for even the slowest reader to read every word on this page, and to think for a few seconds. Meta-refresh to the home page if you like, or one of the others described. Make sure that the meta-refresh time delay exceeds ten seconds, as Yahoo! and some other search engines may treat it as a 301 redirect if it is too short, and you don't want that.

    On to the redirects:

    You will need to define the exact relationship(s) of the old .html URLs to the new .php ones, in order to avoid having to write one redirect directive per old URL. What you are looking for is to find things that the old and new URLs have in common and make a list of these commonalities -- listed in order from what the most frequently-occurring URLs have in common to commonalities shared by just a few URLs. This description is general because that is exactly what you are seeking: A set of general rules to redirect URLs.

    In the best case, you might say. "Where 'x' is any filename or filepath (including directory levels), redirect x.html to x.php". Or you might have to say, "Where 'x' is any filename or filepath, redirect x.html to x.php if x.php exists." Or maybe, "Where 'x' is any filename or filepath, redirect x.html to x.php if x.html does not exist and x.php does exist."

    Adding a slight complication such as "Where 'x' is any filename or filepath, redirect x.html to /new/x.php" is not a problem, nor would "Where 'x' is any filename or filepath, redirect /olddir/x.html to /newdir/x.php" be a problem.

    Then there will likely be exceptions such as, "but only in this particular directory or these directories." Or "but only if the URL-path contains certain character types or sequences." For example, "if the last part of the URL-path starts with "product-code-" followed by a three- to five-digit number, optionally followed by lowercase letters 'a', 'b', or 'c'... That kind of thing.

    The trick is to find the functional descriptions of your old URLs that, when placed in order from most-specific to least-specific, will result in the smallest total number of these descriptions.

    Using the concepts of "URL-spaces" and/or "classes of URLs" will likely come in handy.

    When writing these descriptions, use only the characteristics of the text in the old URL and query string (if applicable). You may also rely on "file exists" checks as described above. You may not rely on any other characteristics of the resources, because mod_rewrite won't be able to test any other characteristics and you won't be able to write a rule based on un-testable resource characteristics.

    Once this list is comprehensive, concise, and complete, then you add "and anything else gets a 404-Not Found" at the end of it. At this point you can proceed to coding a solution.

    Jim

  •