homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 58 message thread spans 2 pages: 58 ( [1] 2 > >     
301'ing over 50 thousand pages

 3:57 pm on Mar 8, 2011 (gmt 0)

Hey guys,
Where we work, we have a product index of over 50 thousand items.

I generated some PHP to automatically write 55k worth of re-write for me because I don't know apache server configuration directives and regex well enough.

This method worked well - however, almost crashed the server.

In reality, we can only cope with 10ks worth of re-writes.

This is an issue however, as we have 35k worth of pages in Googles index.

How can I get around this?

Any advice would be super!

Kind regards,



 4:29 pm on Mar 8, 2011 (gmt 0)

It depends on the URL format of the old and new URL.

If there is a simple relationship, two lines of mod_rewrite code in the .htaccess file could work for the whole site.

Otherwise, if all the URLs to be redirected fit a simple pattern but the relationship between old and new URL is not simple, the most efficient way to deal with these is to rewrite (that's rewrite, not redirect) all of those requests to a special PHP script that looks up the new URL (perhaps in a PHP array, or from a database) and then sends the correct redirect HTTP headers and responses back to the browser.

Simple relationship:

Complex relationship:


 8:33 am on Mar 9, 2011 (gmt 0)

Yeah, it's a simple relationship, I think.

Here's the older URL:

Here's the new URL:

Will it make a difference that both URLS are already re-written?

Here's my current re-write code:

# /product name and id / equipment id
RewriteRule ^/?([a-zA-Z0-9_-]+)/([0-9]+)$ index.php?name=$1&id=$2 [L]

Could someone advise me on how to make this change, please?


 8:45 am on Mar 9, 2011 (gmt 0)

Are you saying that the URL request
gets rewritten to the internal query index.php?name=folder/unique-id/manufacturer/product/product-id/product-id-slug/

Do the new URLs already work, and you are merely doing cleanup of old stuff?

Or, is this a new URL scheme that you want to implement?

What parts of the old URL are actually "required" in the URL in order for the server script to pull the product data from the database?

Are all the other fields verified in the script as being a match for the product, or are they just added for searchengines but the text is not verified internally within the site?


 9:38 am on Mar 9, 2011 (gmt 0)

This is the old re-write code:
RewriteRule ^equipment/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)$ equipment.php?equipment_id=$1&em=$2&ec=$3&mn=$4&mna=$5 [NC]

This is my new code that sits in the equipment folder.
RewriteRule ^/?([a-zA-Z0-9_-]+)/([0-9]+)$ index.php?name=$1&id=$2 [L]

The old structure is already in place and working.

The new structure has been tested and works.

I want to clean up the old ones and use the new structure at the same time.

The only part of the URL that does anything is the equipment ID. That's a 3-6 numeric figure. In the new structure, it's the one on the end and gets all the equipment ID from the DB.

"...or are they just added for searchengines but the text is not verified internally within the site?"
Correct! :D The "name" in the new structure is irrelevant to the functionality but required for SEO.

Hope this helps - I really want to get to the bottom of this.

I really appreciate your help so far buddy.


 10:12 am on Mar 9, 2011 (gmt 0)

OK. Thanks for the clear answers. That really helps, because there's at least six different ways you could have implemented this and knowing which one it is, is crucial.

Eleven levels of "folder" in the old site was madness!

The fact that only the "number" is used to find the database entry is GOOD (but there is one flaw you need to look at later).

The old and new URL structure looks like you can set your redirects from old to new very easily.

As long as everything in the new URL can be found as an element in the old URL, it is simply a case of capturing it in a backreference and then reusing/substituting it in the target URL.

It is good that you already confirmed the proposed new internal rewrite works with new format URLs.

The next step is to make sure that the links on the pages of the site are changed to point to the new format URLs, and the redirect from old format to new format is installed immediately after.

The little flaw (here demonstrated on a very "simple" URL) is as follows.

You link to
example.com/142355/my-great-product and use only 142355 to pull the product from the databse.

Your competitor links to your page as

You now have a Duplicate Content problem. The fix is fairly easy.

You MUST also pass the wordy SEO description to your script (via the rewrite)
RewriteRule ^([0-9]+)/(.*) /index.php?id=$1&product=$2 [L] and follow several simple steps.

First, if $1 has no entry, return a 404 header and error message and a page of links to other products so that you do not bounce the visitor.

If $1 does have an entry, verify the words in $2 against a matching "wordy" field in the database. If the words are an exact match, then show the content. If they do not match, issue a 301 redirect to the correct URL for this product.

With the product ID front loaded in the URL customers clicking a broken link like any of the following would still see your content, but without any Duplicate Content risk.
and so on.

With the redirect in place, all those clicks will be redirected to the correct "wordy" URL.

There's a big flaw in some other sites. They use some of the data in the requested URL to populate stuff on the screen, such as the title tag in HEAD or words in the breadcrumb trail.

For example a product URL is
example.com/kitchen/saucepans/17234 and is rewritten to

Only $3 is used to pull the product from the database.

$1 and $2 are used to populate the TITLE, breadcrumb trail, and the <h1> heading.

If I link to
example.com/shirts/mens/17234, the page will show me the saucepans set with breadcrumb links to the shirts category, and a page title of "shirts : mens : 5 piece saucepan set".

The script should be pulling the category data at the same time it pulls the page content for this page. The category data shouldn't be pulled on the previous page view and added to the links pointing out to other products. If the entire URL is not fully validated as correct, mayhem can ensue.

This is one of the reasons why "category in URL" is usually a bad idea. However it is often made far worse by crazy ideas that database programmers come up with. Here's where the SEO should step in and say no.

[edited by: g1smd at 10:43 am (utc) on Mar 9, 2011]


 10:39 am on Mar 9, 2011 (gmt 0)

It took 2 minutes to find a site with this problem:


Check out this ring in the men's shoes category.

Men's > Shoes > 14k Gold Ring

Totally brain dead programming.


 10:42 am on Mar 9, 2011 (gmt 0)

I didn't write the old URL structure - no idea what gave someone that idea hehe. :|

As for competitors rewriting - that's a really good idea. I've never even thought about that before! That'd be quite an aggressive move, but I'll be sure to build that in.

Any idea as to how I pass the variables between the two rewrites? I've never done this before and it's confusing me...

I wouldn't even know where to begin on the coding side of things.

Any ideas?

Again, thanks for all the help so far!

Update: Amazing that even Macys have done it!


 10:47 am on Mar 9, 2011 (gmt 0)

You could name almost any big company and their site does it.

You don't pass any data between your two rewrites.

You set up a redirect (using RewriteRule and [R=301,L]) so that when an old format URL is requested, the user is told to make a new request for the new format URL, for example (not the right rule for your site, just a quick example):

RewriteRule ^([^/]+)/([^_]+)_([^/]+)/([^/]+)/([^/]+)/([^.]+)$ http://www.example.com/$6/$2-$5 [R=301,L]

When the user returns moments later and requests the new URL, the Rewrite (using RewriteRule and [L]) connects that request to the PHP script (with the data separated out into parameters) to be fulfilled.

RewriteRule ^(0-9+)/([^-]+)-(.*) /index.php?id=$1&manu=$2&prod=$3 [L]

At this point, the old internal rewrite is redundant. Old format URLs are no longer internally rewritten to the script, they are redirected to the new URL.

I prefer product ID at the beginning. I don't add category information to URLs. This creates duplicate content problems when a product is in multiple categories. However, a product can easily link out to the category pages of all the categories it can be found in.

That is,
www.example.com/153773/this-great-acme-widget links out to www.example.com/widgets, www.example.com/acme, www.example.com/under-50-quid and www.example.com/sale-items where the single canonical product URL without category is listed in the links from those categories back to the product page itself.

 2:38 pm on Mar 9, 2011 (gmt 0)

If I'm really honest - I don't understand your code, at all...

Like I said, I'm very new to this.

What's this doing?

^([^/]+)/([^_]+)_([^/]+)/([^/]+)/([^/]+)/([^.]+)$ http://www.example.com/$6/$2-$5 [R=301,L]

Sorry mate. :( I'm tearing my hair out. I really appreciate this advice.


 2:46 pm on Mar 9, 2011 (gmt 0)

Each of the ( ) pairs is "capturing" a part of the requested old URL when a user continues to request the old URL from their bookmarks or in an old email or stale SERP.

Each captured part of the sliced up URL is stored as a variable, starting at $1 for the first. Those variables are then substituted when you "make" the new URL that you want the user to be redirected to. The redirect forces the user to make a new request for that new URL, whenever they request the old one.

In that example, I "re-used" only the $6, $2 and $3 parts of the original URL, to make the redirect to the new URL.

Given the old and new URL format example shown in your second post, you should be able to similarly slice up the old URL and grabs bits of it to make the string for the new URL.

It might help to know that
[^/] means "NOT a slash", and the plus means "one or more times", so ([^/]+)/ matches everything up to the first slash and puts it in $1. You have both the slash and underscore separators in the old URL to help with the pattern matching.

 3:51 pm on Mar 9, 2011 (gmt 0)

I can't thank you enough for your help so far!

I think I'm beginning to get it now.

I've written some code based on this that didn't work. I feel like I'm really close though.

RewriteRule ^equipment/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)$ http://www.website.com/equipment/$2-$4/$1 [R=301,L]
RewriteRule ^equipment/([a-zA-Z0-9_-]+)/([0-9]+) /equipment/index.php?name=$1&id=$2 [L]

Any ideas?


 4:20 pm on Mar 9, 2011 (gmt 0)

Install the "Live HTTP Headers" extension for Firefox and see what you get.

Which bit fails, the redirect from old URL to new URL, or is it that the request for new URL fails to be rewritten to correctly pull the content?

Based on your original URL scheme, shouldn't you be breaking at an underscore, rather than a slash, in some parts of the pattern capture for the redirect?


 4:55 pm on Mar 9, 2011 (gmt 0)

The only problem I see is that you stated that your old URLs end with a slash, but your new rule doesn't comprehend/match that trailing slash.

A minor issue is that you should only use parentheses around a sub-pattern if you need to make use of the sub-string that it matches as a back-reference or if you need the parentheses to 'group' multiple characters or regular-expressions character-match tokens in order to apply a quantifier to all of them. Otherwise, you're just making your server do unnecessary work -- minor in this case, but such things can add up quickly on a busy site, resulting in an early server upgrade being needed.

I'd suggest:

# Externally redirect requests for old URLS of the form
# /equipment/<unique-id>/<brand>/<product>/<product-id>/<product-id-slug>/ to new URLs of the form
# /equipment/<brand>-<product-id>/<unique-id>
RewriteRule ^equipment/([^/]+)/([^/]+)/[^/]+/([^/]+)/[^/]+/$ http://www.example.com/equipment/$2-$3/$1 [R=301,L]

Using comments to make the code self-documenting as shown here is highly recommended -- both for ease of discussion here and for future code maintenance.

So, for URLs matching the given format, you're down to one rule -- not hundreds or thousands.

The resources cited in our Apache Forum Charter and some of the threads in our Apache Forum Library may prove useful to you.



 5:05 pm on Mar 9, 2011 (gmt 0)

I installed that as you said, but can't seem to debug my issue. I really don't get this, I'm well out of my depth. >.<

I'm really confused - worried i'm starting to go in circles a bit.

The script is basically causing the page to 404.

I'm trying to run it from inside the equipment folder and it's not having it.

What do you mean about breaking at an underscore?

Sorry if I seem a bit dense. Regex isn't something I write in all too frequently and even when I do, I struggle.


 5:07 pm on Mar 9, 2011 (gmt 0)

Forget the underscore comment. I misread where the individual elements needed to be broken down.

Thanks Jim for your neat solution. I missed the trailing slash.

If you're running this from within the equipment folder, remove
equipment/ from all of the RegEx patterns (only from the patterns) in that example, and make sure the redirected-to URL does include the full path of the new location.

I'm losing signal for large chunks of this journey. I've travelled a long way during the latter course of this thread. WebmasterWorld beats reading a book.


 7:49 pm on Mar 9, 2011 (gmt 0)

Since this is a redirect, I strongly suggest putting the code (as posted) into the root .htaccess file. Otherwise, you preclude the use of *any* internal rewrites in the root .htaccess file which might affect the subdirectory -- now or later.

This is because if any external redirect is done after an internal rewrite, that redirect will 'expose' the rewritten (filepath) as a URL to the client. This defeats the 'pretty URL' techniques and makes a big mess of search engine listings and rankings.

Across all .htaccess files in the directory-path traversed by any given URL request:
Your rules must be ordered with all external redirects first, in order from most-specific patterns and conditions (e.g. single or only a few URLs affected) to least-specific patterns and conditions (e.g. more URLs affected), followed by all internal rewrites, again in order from most- to least-specific.

This requires quite a bit of on-going care with rule order when multiple .htaccess files are to be used.

Failure to take that on-going care can put you out of business in less than a month. This should definitely be considered as part of the performance/maintenance trade-offs between directory-localized and 'global' .htaccess files -- and between server config files and .htaccess files as well, where applicable.



 9:31 am on Mar 10, 2011 (gmt 0)

Morning guys,
Thanks for continuing to help me! I really appreciate that piece of code, JDM.

I ran it from the root as recommended.

Unfortunatley, it's causing my browser to time out. A similar sort of response to that when people make websites that can't handle users without cookies and the browser just keeps on trying to refresh.

This is my code now:

RewriteRule ^equipment/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$ http://website.net/equipment/$2-$3/$1 [R=301,L]
RewriteRule ^equipment/([a-zA-Z0-9_-]+)/([0-9]+)$ http://website.net/equipment/index.php?name=$1&id=$2 [L]

Any thoughts guys?


 9:43 am on Mar 10, 2011 (gmt 0)

Yes. Remove the domain name from the rewrite. You have turned it into a 302 redirect.

You will also need the "more specific" patterns that jd posted.

Use the "Live HTTP Headers" extension to see what your browser is doing when it "freezes". It is likely stuck in an infinite redirect loop.


 12:07 pm on Mar 10, 2011 (gmt 0)

I'm getting the following - I don't know what I am looking for... Is it the response codes?

I was going to copy and paste it over but there's 1500+ lines of stuff.

The final code is a 200, a couple before are 302.

g1smd, I really appreciate your help mate, but don't forget I'm completely new to this... I'm getting really lost in all of this.

How did I 302 my URL?

Remove the domain from which one of the rewrites?

I hope I'm not frustrating you. >.<


 12:27 pm on Mar 10, 2011 (gmt 0)

The one with domain name and [R=301,L] is a 301 redirect.

The one with [L] should be a rewrite, but adding the domain name turned it into a 302 redirect. That's the one to change back to a rewrite, as it was in all the previous posts.


 2:07 pm on Mar 10, 2011 (gmt 0)

Okie doke, I made your changes and got the same error.

The URL isn't changing as they do typically with 301 and 2s.

Here's my code, could you advise me on where I am going wrong, please?

RewriteRule ^equipment/([^/]+)/([^/]+)/[^/]+/([^/]+)/[^/]+/$ http://example.com/equipment/$2-$3/$1 [R=301,L]
RewriteRule ^equipment/([a-zA-Z0-9_-]+)/([0-9]+)$ /equipment/index.php?name=$1&id=$2 [L]


 2:55 pm on Mar 10, 2011 (gmt 0)



Redirect 301 /directory/subdirectory http:// www.newdomainname.com/directory/subdirectory

result in a 301 or 302 ?



 3:14 pm on Mar 10, 2011 (gmt 0)

301 mate.


 4:36 pm on Mar 10, 2011 (gmt 0)

Don't mix
Redirect and RewriteRule code in the same website. Once you have used RewriteRule for any of your rules, you must use it for all of your Rules.

RewriteRule ^thispage http://www.example.com/thatpage [R=301,L]
is functionally no different to
Redirect 301 /thispage http://www.example.com/thatpage

However, using
RewriteRule gives you more control because Redirect automatically re-appends the rest of the original path on to the end of the new URL.

Mixing the two types of directives in one site can cause many issues. Stick with using
RewriteRule for both external redirects and internal rewrites.

 12:06 am on Mar 11, 2011 (gmt 0)

Cheers guys


 1:06 am on Mar 11, 2011 (gmt 0)


0 -- equipment
1 -- 4009
2 -- siemens
3 -- plc
4+5-6 -- 6es5+943-7ua11
7 -- 6es59437u11


equipment -- 0
siemens -- 2
6es5_943_7ub21 -- 4_5_6
2067 -- 1

This assumes that 4009 and 2067 are unique IDs for two different products, that the above isn't the exact old and new URLs for the *same* product. If the number is changing from 4009 in the old URL, to 2067 in the new URL there is NO WAY for mod_rewrite to know that.

This assumes that 7ua11 and 7ub21 are product-ids for two different products, that the above isn't the exact old and new URLs for the *same* product. If part of the product-id is changing from 7ua11 in the old URL, to 7ub11 in the new URL there is NO WAY for mod_rewrite to know that.

This also assumes that the "manufacturer" field in the old URL is exactly the same thing as the "brand" field in new URL.

RewriteRule ^equipment/([^/]+)/([^/]+)/([^/]+)/([^+]+)\+([^-]+)-([^/]+)/([^/]+)/$ http://www.example.com/equipment/$2-$4_$5_$6/$1 [R=301,L]

RewriteRule ^equipment/([a-zA-Z0-9_-]+)/([0-9]+)$ http://website.net/equipment/index.php?name=$1&id=$2 [L]

Where it was going wrong, is that both + and - in the old URL become underscores in the new URL. You didn't allow for that.


 11:16 am on Mar 15, 2011 (gmt 0)

Did this fix it?


 9:31 am on Mar 17, 2011 (gmt 0)

Hiya mate,
First of all, thanks for such a comprehensive solution and sorry for such a delayed reply! I've been absoloutly swamped this week.

I finally got around to implementing it this morning but it didn't work.

I'm don't think I've explained it fully...

OLD: /equipment/4009/siemens/plc/6es5+943-7ua11/6es59437u11/

0 -- equipment
1 -- 4009
2 -- siemens
3 -- plc
4 -- 6es5+943-7ua11
5 -- 6es59437u11

4 is exactly the same as 5 but in it's original format. I have no idea why the old developer did it like that... >.<

Do you think that is where it could be falling over?

Upon running the site with the new code, I get an error 404. Here's a glance at the file structure if this helps:


Here's the .htaccess so far:

ErrorDocument 400 http://website.com/error
ErrorDocument 401 http://website.com/error
ErrorDocument 403 http://website.com/error
ErrorDocument 404 http://website.com/error
ErrorDocument 500 http://website.com/error

RewriteEngine On

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ $1.php [L,QSA]

redirect 301 /index http://website.com

redirect 301 /search/+/all+manufacturers/+/cnc/2/10/1/ /products/cncs/1
redirect 301 /search/+/all+manufacturers/+/drive/4/10/1/ /products/drives/1
redirect 301 /search/+/all+manufacturers/+/servo+drive/18/10/1/ /products/servos/1
redirect 301 /search/+/all+manufacturers/+/encoder+%26+resolver/5/10/1/ /products/motors_and_encoders/1
redirect 301 /search/+/all+manufacturers/+/plc/14/10/1/ /products/plcs_and_software/1
redirect 301 /search/+/all+manufacturers/+/computer/3/10/1/ /products/indnettrial_pcs_and_hmis/1
redirect 301 /search/+/all+manufacturers/+/monitor/10/10/1/ /products/monitors/1
redirect 301 /search/+/all+manufacturers/+/robot/16/10/1/ /products/robots/1
redirect 301 /search/+/all+manufacturers/+/power+supply/15/10/1/ /products/power_supplies/1
redirect 301 /search/+/all+manufacturers/+/safety+equipment/17/10/1/ /products/safety_equipment/1
redirect 301 /search/+/all+manufacturers/+/comms/34/10/1/ /products/communications/1

redirect 301 /enquire.php /general_enquiry

redirect 301 /help_policies.php /help/policies

redirect 301 /visual_sitemap.php /site_map

# Old Structure: equipment/<unique-id>/<brand>/<category>/<product-id>/<product-id-slug>/
# New Structure: equipment/<brand>-<product-id>/<unique-id>
RewriteRule ^equipment/([^/]+)/([^/]+)/([^/]+)/([^+]+)\+([^-]+)-([^/]+)/([^/]+)/$ http://website.com/equipment/$2-$4_$5_$6/$1 [R=301,L]
RewriteRule ^equipment/([a-zA-Z0-9_-]+)/([0-9]+)$ http://website.com/equipment/index.php?name=$1&id=$2 [L]

Thanks again buddy.


 8:02 pm on Mar 17, 2011 (gmt 0)

Change all the Redirect instructions to use RewriteRule syntax with the [R=301,L] flags. The target must also include the protocol and domain name.

The internal rewrites must be moved to the very end of the file. Currently the first internal rewrite grabs all the requests, and the longer rule at the end never gets to run. The "catch all" rule must be the very last rule of all.

As for the code that "did not work", it should have worked IF all the sub-parts matched up. There were several questions in that long post.

1. Is it right that + and - in the old URL become _ in the new URL?

2. Is the 7ua11 in the old URL, still 7ua11 in the new URL, or is it 7ub21? If the latter, then RewriteRule has NO WAY to know how to change the numbers.

This 58 message thread spans 2 pages: 58 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved