Forum Moderators: phranque
Currently i have a wordpress installation living in the root folder and a custom shop running in /shop/
Anyway to seperate the two rules out?
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^shop/sections/(.*)/(.*)/(.*)/(.*)/ /shop/viewproducts.php?category_id=$1&position=$3&nresults=$4 [nc]
RewriteRule ^shop/sections/(.*)/(.*)/ /shop/viewproducts.php?category_id=$1 [nc]
RewriteRule ^shop/brands/(.*)/(.*)/(.*)/(.*)/ /shop/brand.php?brid=$1&position=$3&nresults=$4 [nc]
RewriteRule ^shop/brands/(.*)/(.*)/ /shop/brand.php?brid=$1 [nc]
RewriteRule ^shop/products/(.*)/(.*)/(.*)/ /shop/displayproduct.php?product_id=$2&category_id=$1 [nc]
RewriteRule ^shop/docs/(.*)/(.*)/ /shop/pages.php?cmsid=$1 [nc]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
First, you don't need <IfModule> unless you plan to port this code to several servers, you know that some of them won't support mod_rewrite, and you want the code to fail silently on those non-supporting servers. So unless your situation resembles this unlikely case, you can remove that container.
Second, "RewriteBase /" is the default setting, so this directive isn't needed unless you have previously set RewriteBase to point elsewhere. Again unlikely, so let's remove that as well.
Next, the rules all use multiple instances of the "easy-to-write, but ambiguous, greedy, and promiscuous" regular-expressions pattern ".*". I expect that your server will perform quite poorly under load for this reason alone.
Finally -- and likely the cause of your problem, every request to your server will be subjected to every rewrite in your code, because you did not use the "[L]" flag on your rules. So, if the originally-requested URL-path does not resolve to an existing file on your server, all such requests will be rewritten to /index.php, regardless of whether the URL was rewritten by a previous rule. You should always use "[L]" on every rewriterule, unless you *know* a good reason that you don't want to.
So here's a leaner, meaner, cleaned-up version:
RewriteEngine on
#
RewriteRule ^shop/sections/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$ /shop/viewproducts.php?category_id=$1&position=$3&nresults=$4 [L]
RewriteRule ^shop/sections/([^/]+)/([^/]+)/$ /shop/viewproducts.php?category_id=$1 [L]
RewriteRule ^shop/brands/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$ /shop/brand.php?brid=$1&position=$3&nresults=$4 [L]
RewriteRule ^shop/brands/([^/]+)/([^/]+)/$ /shop/brand.php?brid=$1 [L]
RewriteRule ^shop/products/([^/]+)/([^/]+)/([^/]+)/$ /shop/displayproduct.php?product_id=$2&category_id=$1 [L]
RewriteRule ^shop/docs/([^/]+)/([^/]+)/ /shop/pages.php?cmsid=$1 [L]
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
Duplicate content arises whenever the same content can be reached by more than one exact, unique URL. Allowing case variations and arbitrary "tails" on URLs requested from your site (as the original code does) creates duplicate content. The search engines, seeing that the same content can be reached using multiple URLs, will either "pick one" (and not necessarily the one you consider to be correct), or they will split your PageRank/Link-popularity across these multiple URLs, diluting the effective rank of the actual "page."
Despite my end-anchoring the patterns and removing "[NC]" there remains another such flaw: You have parameters in your "friendly" URLs which are not used in the script query strings. These parameters could conceivably take on any value, and yet would be accepted. You should either get rid of them (remove them from the friendly URLs on your pages), or you should validate them -- Check that they have values included in a list of acceptable values. If the list is very short, you could do that in mod_rewrite. Otherwise, they should be passed to your scripts as a query string parameter for validation within the script itself. If the values fail validation, then you should write the code (in mod_rewrite or the scripts) to reject the request with a 404-Not Found, or a 301 redirect to the correct URL if it can be unambiguously-determined.
Your competitors, should they discover these flaws, can easily exploit them simply by linking to multiple incorrect variations of the "correct" URL, thus knocking your pages down in the search results.
You must decide how you want to handle all such requests for non-canonical URLs, I can't do that for you. You can easily add rules to redirect badly-formed URLs, such as "example.com/shop/sections/widgets/fuzzy/round/blue/a-bunch-of-junk-here" to the correct URL, likely "example.com/shop/sections/widgets/fuzzy/round/blue/" simply by truncating the URL after the last expected slash.
The case of "example.com/shop/sections/widgets/a-bunch-of-junk-here/round/blue/a-bunch-of-junk-here" is a bit more difficult to discuss, since I don't know what values for that second part of the "/widgets" URL-path would be valid.
Case errors are quite a bit more trouble, though. mod_rewrite in .htaccess in not very efficient at fixing case errors, because the characters must be corrected one-at-a-time, requiring the URL to pass through 26 rules -- one for each character of the basic US-ASCII or UTF-8 alphabet. Passing the URLs through multiple rules also triggers a well-know bug in Apache mod_rewrite, which must be worked around. If you really do have links to incorrect-case or mixed-case URLs, we can discuss that later after getting the rest of this working.
If you have server-config-file access, a much better approach is to define a RewriteMap to call the operating systems "tolower" case-conversion function. All URLs with *any* uppercase characters can then be passed to "tolower" and the result used to generate a 301-Moved Permanently redirect.
However, if your site was designed to use mixed-case URLs, there is no good solution. You'd have to write hundreds of rules for any and all possible case-variations of each and every URL, likely resulting in tens of thousands of rules, or use mod_speling, which can only fix a few errors per URL, and does so by making hundreds of "searches" of your filesystem, which is extremely slow and CPU intensive. I hope this is not the case...
A lot more information on duplicate content and URL canonicalization is available here by searching WebmasterWorld (See search link at top of page); One of the more appropriately-named threads is "Duplicate Content -- Get it Right or Perish."
Jim