Forum Moderators: phranque

Message Too Old, No Replies

Redirecting wrong queries to 404 not found

google SEO mod_rewrite QUERY_STRING

         

Sumerian

10:46 am on Sep 22, 2009 (gmt 0)

10+ Year Member



Hello

Recently, I've converted my website's software from phpBB 3.05 to IPB 3.03. The problem I'm facing is that IPB is configured in a way that if you pass a query to a page that doesn't exist, it will respond like: 200 here's the index page.

because of this, google is still indexing the wrong pages as there is no way for it to know that this pages do not exist any more.

OK, so what I'm trying to do is:
redirect this:

http://example.com/?f=something or ?t=something

to 404 page not found:

I've put the following in my htaccess file:

Options +FollowSymlinks
RewriteEngine On
RewriteCond %{QUERY_STRING} ^[f¦t]=(.*)
RewriteRule ^ /404.html

on a test site of this structure: http://example.com/test , it did work put when I try to do the same for the root folder, I get 500 internal server error after I browse to: http://example.com/?f=something

I wonder what I'm doing wrong !

[edited by: jdMorgan at 1:36 pm (utc) on Sep. 22, 2009]
[edit reason] example.com [/edit]

jdMorgan

1:48 pm on Sep 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Two problems. The 500-Server error is likely being caused because you did not clear the query string. Therefore, your rule will rewrite a request for <anything>?f=abc to /404.html?f=abc. Then, since this path still matches your rule, the rule will be re-invoked, rewriting /404.html?f=abc to /404.html?f=abc again and again, leading to an 'infinite loop.' After many iterations, the server detects this loop and throws a 500-Server Error.

The second and actually more-serious problem is that if you rewrite to a 404 page, the server will return a 200-OK status response. So search engines will see these URLs as being valid and resolving to existing pages... Not what you want.

A third but minor error is that "¦" is not needed within an alternate character group unless you wish to match a literal "¦" character. So [f¦t] should probably just be [ft]. And since you do not re-use the query string parameter value, there is no need to parenthesize it.

I'd suggest:


Options +FollowSymlinks
RewriteEngine on
#
RewriteCond %{QUERY_STRING} ^[ft]=[^&]*
RewriteRule ^ /<[i]some-filepath-that-does-not-and-will-never-exist[/i]> [L]

On Apache 2.x and later, you could also change the RewriteRule to

RewriteRule ^ - [R=404,L]

Jim

Sumerian

3:11 pm on Sep 22, 2009 (gmt 0)

10+ Year Member



worked as charm , thank you
BTW: the 404.html is just my way of saying custom 404 page :)

thank you again

jdMorgan

5:48 pm on Sep 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, but *redirecting* to a custom 404 page returns a 200-OK response not a 404-Not Found response. From a search engine viewpoint, it makes no difference what page or content you serve in response to a 404 error, but returning the correct server response code (404) is critically important to the 'health' of your site in search rankings...

Rewriting to a non-existent filepath is just a simple way to force a proper 404 response on Apache versions previous to Apache 2.x. The page to be served is then taken from your "ErrorDocument 404" directive if any, or
if there is no custom 404 error document specified, then the server will return the default 404 error text.

And that brings up another critical point. Never specify a URL as an ErrorDocument. Always use a local server filepath. Otherwise, you'll get a 302 redirect server response code!

Jim

Sumerian

2:44 pm on Sep 23, 2009 (gmt 0)

10+ Year Member




oh I see, I'll do it your way :)

thank you

jdMorgan

2:52 pm on Sep 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Once you have finished setting this up, test it carefully. Use the "Live HTTP Headers" add-on for Firefox/Mozilla browsers (or a similar tool) to verify that your 404 handling is correct, returning the proper 404-Not Found status code in all cases, with no intervening 30x redirects of any kind.

Note that you may need to completely-flush (delete) your browser cache while testing, in order to force your browser to actually fetch the current page (and/or error response) from your server instead of serving it from your local browser cache.

Jim