Forum Moderators: phranque

Message Too Old, No Replies

Strange dilemma with mod rewrite and 301 redirects

404 responses...

         

brokaddr

10:06 pm on Oct 29, 2010 (gmt 0)

10+ Year Member



Scenario:
http://www.example.com/directory/product-example.com is a 404 (wanted behavior; product is gone.)

When I go to:
http://example.com/directory/product-example.htm I end up with this:
http://www.example.com/product.php?pid=product-example (this is my native URL structure before mod_rewrite -- I get 301 response, which then redirects to a 404 because the pid string is invalid. ...it shows the rewritten URL in the querystring instead of the product id; which was 488 (before it was deleted).

How can I prevent this?

Here's the snippet from htaccess which redirects *.example.com to www.example.com and I believe this is the culprit for the 301 response:
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


I want *.www.example.com to retain the URL structure.. so instead of the redirect to my native URL structure, I should be seeing:
http://example.com/directory/product-example.com with NO 301; just a 404 response.

jdMorgan

11:20 pm on Oct 29, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Across all config and .htaccess files, make sure that all external redirects are executed before all internal rewrites. If this is not so, then internally-rewritten-to filepaths (not "native URLs" as you termed them, but filepaths) will be exposed as URLs to the client.

See Proper Order for htaccess [webmasterworld.com] in our Library for more info.

[added]
You will still get a 301 from the non-www hostname to the www hostname, but the internal filepath script filepath will no longer be exposed. To prevent the 301, you need to properly declare the page as 410-Gone, possibly using

RewriteRule ^product-example\.htm$ - [G]

ahead of your external redirects.
[/added]

Jim

brokaddr

11:37 pm on Oct 29, 2010 (gmt 0)

10+ Year Member



jdMorgan, thanks for the link. I had the www/301 at the bottom of my rewrite. I positioned it like so:
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

#rewrites down here


but I'm still getting the original URL exposed (for 404'd URLs; the 200-response URLs redirect as expected).

I'm not having any issues with the 404s showing where they need to be, so I'm not entirely sure what different the 410 would make.

jdMorgan

12:40 am on Oct 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You're going to have to post the relevant parts of your code to get any useful comments, then...

We'll need to see your ErrorDocument declarations, then RewriteRule(s) that pass "search friendly" URLs to your script(s), and any other code that may 'handle' these missing-URL requests.

Also, be very sure to delete your browser cache before testing any new code. Otherwise, your browser is likely to show you previously cached pages and server response codes.

You will likely find it very helpful in understanding URL-to-URL redirecting and URL-to-filepath rewriting if you make a distinction between the two... With your friendly-URL-to-script-filepath rewriting in place, the "old dynamic URL-paths" are no longer URL-paths, but they are still the filepaths used to invoke your script(s).

So the problem here is that those internally-rewritten-to script filepaths are being exposed to clients as URLs by a subsequent (and unexpected) external redirect... And this would be a problem whether or not those filepaths were ever the same as your "old URL-paths."

Jim

brokaddr

4:13 pm on Oct 30, 2010 (gmt 0)

10+ Year Member



I tried on a different PC and the URLs are now redirecting; I must have ran into the cache issue. Thanks for the help!

My question now, is will 301'ing example.com/product to www.example.com/product (resulting in a 404) hurt my SEO?
I know some search engines will first try with the www (on a previously existing page), if they can't find it, they'll try without the www.

jdMorgan

1:24 pm on Nov 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Under ideal circumstances, you don't want to redirect to a new URL that results in a 404. It would be best to check for the 'dead' URL-paths first, and return a 410-Gone -- at least for your most popular/important pages.

However, you would have to do both the 410-checking and the 301 redirects with your script to get this right: The script would first check to see if the product exists and then, if the hostname is non-canonical, redirect the request to the canonical hostname. If not, then return a 410-Gone without redirecting due to the non-canonical hostname.

However, if this is a 'big script' then you may not want to modify it to do this, and you'll just have to live with the 301-404 chain. While this is not optimal, it is certainly not a huge problem, and the search engines should understand what it means. You may see 'complaints' about it in their Webmaster Tools reports, but eventually, they should correctly act on the changes.

Jim