g1smd - 10:12 am on Mar 9, 2011 (gmt 0) [edited by: g1smd at 10:43 am (utc) on Mar 9, 2011]
OK. Thanks for the clear answers. That really helps, because there's at least six different ways you could have implemented this and knowing which one it is, is crucial.
Eleven levels of "folder" in the old site was madness!
The fact that only the "number" is used to find the database entry is GOOD (but there is one flaw you need to look at later).
The old and new URL structure looks like you can set your redirects from old to new very easily.
As long as everything in the new URL can be found as an element in the old URL, it is simply a case of capturing it in a backreference and then reusing/substituting it in the target URL.
It is good that you already confirmed the proposed new internal rewrite works with new format URLs.
The next step is to make sure that the links on the pages of the site are changed to point to the new format URLs, and the redirect from old format to new format is installed immediately after.
The little flaw (here demonstrated on a very "simple" URL) is as follows.
You link to
example.com/142355/my-great-product and use only 142355 to pull the product from the databse.
Your competitor links to your page as
You now have a Duplicate Content problem. The fix is fairly easy.
You MUST also pass the wordy SEO description to your script (via the rewrite)
RewriteRule ^([0-9]+)/(.*) /index.php?id=$1&product=$2 [L] and follow several simple steps.
First, if $1 has no entry, return a 404 header and error message and a page of links to other products so that you do not bounce the visitor.
If $1 does have an entry, verify the words in $2 against a matching "wordy" field in the database. If the words are an exact match, then show the content. If they do not match, issue a 301 redirect to the correct URL for this product.
With the product ID front loaded in the URL customers clicking a broken link like any of the following would still see your content, but without any Duplicate Content risk.
and so on.
With the redirect in place, all those clicks will be redirected to the correct "wordy" URL.
There's a big flaw in some other sites. They use some of the data in the requested URL to populate stuff on the screen, such as the title tag in HEAD or words in the breadcrumb trail.
For example a product URL is
example.com/kitchen/saucepans/17234 and is rewritten to
Only $3 is used to pull the product from the database.
$1 and $2 are used to populate the TITLE, breadcrumb trail, and the <h1> heading.
If I link to
example.com/shirts/mens/17234, the page will show me the saucepans set with breadcrumb links to the shirts category, and a page title of "shirts : mens : 5 piece saucepan set".
The script should be pulling the category data at the same time it pulls the page content for this page. The category data shouldn't be pulled on the previous page view and added to the links pointing out to other products. If the entire URL is not fully validated as correct, mayhem can ensue.
This is one of the reasons why "category in URL" is usually a bad idea. However it is often made far worse by crazy ideas that database programmers come up with. Here's where the SEO should step in and say no.
[edited by: g1smd at 10:43 am (utc) on Mar 9, 2011]