Forum Moderators: phranque

Message Too Old, No Replies

404 error

is it because of my .htaccess

         

malik112

4:07 am on Apr 2, 2009 (gmt 0)

10+ Year Member



hello, i have been getting 404 errors for my 'category' pages on my e-commerce website. i hv consulted in other forums and most people have suggested a problem with the redirects. can someone please take a look, if there is a problem since category.php is not found on the server (though its there). However, products.php is working fine.

Here is the .htaccess code please:
RewriteEngine On

# Redirect to correct domain if incorrect to avoid canonicalization problems
RewriteCond %{HTTP_HOST} !^example\.com
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]

# Redirect URLs ending in /index.php or /index.html to /
RewriteCond %{THE_REQUEST} ^GET\ .*/index\.(php¦html)\ HTTP
RewriteRule ^(.*)index\.(php¦html)$ /$1 [R=301,L]

# Rewrite keyword-rich URLs for paged category pages
RewriteRule ^Products/.*-C([0-9]+)/Page-([0-9]+)/?$ category.php?category_id=$1&page=$2 [L]

# Rewrite keyword-rich URLs for category pages
RewriteRule ^Products/.*-C([0-9]+)/?$ category.php?category_id=$1&page=1 [L]

# Rewrite keyword-rich URLs for product pages
RewriteRule ^Products/.*-C([0-9]+)/.*-P([0-9]+)\.html$ /product.php?category_id=$1&product_id=$2&%{QUERY_STRING} [L]

# Rewrite media files
RewriteRule ^.*-M([0-9]+)\..*$ /media/$1 [L]

# Rewrite robots.txt
RewriteRule ^robots.txt$ /robots.php

[edited by: jdMorgan at 6:34 am (utc) on April 2, 2009]
[edit reason] example.com [/edit]

jdMorgan

6:39 am on Apr 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Note that your products.php path starts with a slash, while your category.php path does not. Have you tried adding the slash?

Your first two rules are reversed, and both should contain the domain name in the substitution URL. Note that we use only "example.com" here -- for our protection and yours.

Your internal redirect rule patterns need some optimization, and there are some potentially-very-serious duplicate-content and googlebombing vulnerabilities in this approach; If other WebmasterWorld members do not address these additional problems, I will do so when I have a bit more time.

Jim

g1smd

8:48 am on Apr 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The first two rules are not optimum and the index rule needs to be listed before the non-www rule.

Study your code and how it differs to that in [webmasterworld.com...] and adopt that instead.

.

There are three major problems with each of your four rewrites, those where you use

.*
near the beginning of the pattern.

Taking just one example:

# Rewrite keyword-rich URLs for category pages 
RewriteRule ^Products/[b].*[/b]-C([0-9]+)[b]/?[/b]$ category.php?category_id=$1&page=1 [L]

Using

.*
the server will match everything in the URL right to the end. The .* pattern is 'greedy'. It says 'grab the whole URL'. That is the wrong thing to do, because it will then have to back off and retry hundreds of matches until it finds the right one, because there is more stuff to match after the place you said 'get it all'. You should use a more specific pattern - one that can be parsed from left to right. I have no idea what goes in there, but I would assume it is hyphenated keywords. The pattern 'as is' is inefficient and will be slow to operate. You might need something like
(([^\-]+\-)+)
instead. This says 'match to the next hyphen, one or more times' and will be more efficient.

The other problem is very common and a potential way for your search results to be completely destroyed. You likely want a URL like

example.com/Products/my-cool-stuff-C94728/
but you don't 'qualify' or 'check' the keywords. That means a competitor could link to
example.com/Products/poisonous-unsafe-junk-C94728/
and your site would return '200 OK' and duplicate content, and would be indexed and would rank. You need to grab that part of the URL, and send it as an extra parameter (like
&keywords=$n
for example) to your script where the script will validate the words are exactly right for that page of content. For non-valid words your script should use the HEADER command to return either a 404 Error, or a 301 redirect to the correct URL for that content.

The last problem is relatively minor, but yet another way to destroy your SERP. You allow a 'valid' URL to either have or to omit the trailing slash (see

/?$
in your code). That is, both will return '200 OK' and the same content. That's more Duplicate Content. What you should do, is pick one of them to be the canonical URL, and for all other requests in the 'other' format, you should issue a 301 redirect. You already do that for 'with-www' - you redirect to 'without-www'. You should do the same here. Redirect 'with-slash' to 'without-slash'. That same rule should also force non-www, etc, within the same rule, otherwise you end up with a redirection chain. This new redirect must be listed *before* all of the other redirects.

One minor issue is the used of mixed-case. It is much harder for a mixed-case URL to be passed by speech and conveyed correctly. Allowing the server to respond to mixed case is another form of Duplicate Content. You should use all lower-case for all of the URL if you can. It will prevent a lot of headaches in the future.

.

Finally, you have a bunch of redirects completely missing. If I request

example.com/category.php?category_id=458292&page=1
I will be served the content with '200 OK' status. You should take these requests (both for www and non-www, and for parameters in *any* order) and redirect them to the canonical form, forcing www at the same time for those requests. Failure to do so is yet another source of Duplicate Content.

Having said all that, your initial code was one of the best 'first go' coding examples seen in recent weeks. However, the job is a lot more involved than you first expected.

[edited by: jdMorgan at 4:01 pm (utc) on April 2, 2009]
[edit reason] edited at poster's request [/edit]

g1smd

9:54 am on Apr 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Finally, you have a bunch of redirects completely missing. If I request
example.com/category.php?category_id=458292&page=1
I will be served the content with '200 OK' status.

You should take these requests (both for www and non-www, and for parameters in *any* order) and redirect them to the canonical form, forcing www at the same time for those requests. Failure to do so is yet another source of Duplicate Content.

Normally those redirects would be listed first in your .htaccess file.

In this case you have to insert keywords into the new URL and there is no way for .htaccess to do that.

The solution is fairly simple. Use a rewrite to connect those requests to a special redirect script that uses the category and/or product number to look up the keyword list in the database. Use the PHP HEADER command to send a 301 redirect to the correct URL. Do ensure you add the extra stuff to this to make a 301 redirect, as the default is a 302 redirect.

The URLs pointed to by redirects should also contain both the protocol and the full domain name, so that there is no ambiguity when the non-canonical version is requested. The redirect should fix both of those things at the same time as it fixes everything else.

Your redirect script is likely a dozen lines of PHP code and a database query, for each type of URL. It is a fairly simple job.

The redirect script will also need to return a HEADER 404 for any completely non-valid requests (where the category or product ID does not exist at all).

Having said all that, your initial code was one of the best 'first go' coding examples seen in recent weeks. However, the job is a lot more involved than you first expected.

jdMorgan

3:36 pm on Apr 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The code to redirect direct client requests for the dynamic URLs is almost exactly the same as the code already present in the main script where it is used to generate the static on-page links. Therefore, the job of redirecting direct client requests for the script could be done in the main script itself...

Jim

g1smd

3:46 pm on Apr 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, the main script could do that, but it would have to have some way of knowing whether it is being called via a rewrite as a result of a 'friendly URL' request, or is being directly accessed as a parameter-driven 'old style' URL.

The former should lead to content being served, and the latter should result in a redirect back to the friendly URL format. The requests could be differentiated by adding a 'hidden' parameter (like

&friendly=true
or something) in the rewrite to trigger this selection. If the parameter is missing, serve a redirect to the friendly URL, and strip all parameters in that redirect. If the parameter is present, serve the content.

For me, I would simply rewrite direct client requests for category.php and product.php URLs to this redirect.php script and let that generate the correct redirects. The script would need to do a quick lookup in the database to get the correct keyword part of the URL that matches the category and product IDs, so that it could generate the correct 'friendly' URLs.

It would also need to check that the category ID does exist, and the product ID does exist, and that the category and product ID when used together are a valid combination. For any that fail this test, a 404 error needs to be issued. This function will also need to appear in your main script... you don't want to serve content when 'incorrect category and product combinations' appear in a URL.

g1smd

7:56 pm on Apr 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's been a few days... How are you getting on with this?