Forum Moderators: phranque

Message Too Old, No Replies

Stopping ?query strings that don't exist

         

internetheaven

7:06 am on May 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some smart ar*e has linked to my site using lots of different queries e.g.

www.example.com/index.php?noio_ino
www.example.com/?iie9n4&dd4
www.example.com/folder/?ofnfe
www.example.com/folder/file.php?id=on772

I don't use query strings on my site. All these pages have been indexed by Google and are therefore duplicate content.

Can I set up the htaccess file to 301 any URL with a query string to the URL without the query string?

Thanks
Mike

TheMadScientist

7:23 am on May 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This thread should get you started:
Remove Query String from URL [webmasterworld.com]

You will probably need to adjust the condition to check for any 'not-empty' query string.

RewriteCond %{QUERY_STRING} .

If you have other redirects, you will probably gain some efficiency by appending a blank QUERY_STRING to a redirect which will already occur. If you use canonicalization, you could probably add:

RewriteCond %{QUERY_STRING} . [OR]

preceding your %{HTTP_HOST} check and use the single rule for both.

internetheaven

8:42 am on May 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Great, thanks. I've now got:

RewriteCond %{QUERY_STRING} .
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

which seems to be going the job.

If I were to add any scripting (e.g. a forum) to the site at a later date, is there anyway to make this rewrite not strip the query strings for JUST THAT folder?

Thanks
Mike

jdMorgan

5:50 pm on May 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sure, just add a RewriteCond with a negative match on the folder path to provide an exclusion for that folder.

RewriteCond $1 !^excluded-folder/

Jim

g1smd

6:14 pm on May 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a site where that was happening, by accident, as well as the previous designer messing up some of the internal linking.

I now only allow certain query string formats to hit the server. Incorrect formats are stripped.

RewriteCond %{QUERY_STRING} !^(prod¦code)=[0-9]{5}(&(page¦ident)=[0-9]{2})?$
RewriteCond %{REQUEST_URI} !(products.html¦store\/?)$
RewriteCond %{THE_REQUEST} \?.*\ HTTP/ [NC]
RewriteRule (.*)$ http[i][/i]://www.domain.eu/$1? [R=301,L]

There are still a small number of incorrect URLs that can get through, but the PHP script checks those and throws a 404 error for those anyway.

By pre-processing the query string in the .htaccess file, I get better coverage and give the PHP script less work to do.

Extra parameters, wrong parameters, wrong parameter values (too many, or too few digits), wrong parameter order, and so on, are all dumped by that .htaccess processing.

[edited by: jdMorgan at 10:54 pm (utc) on May 17, 2008]
[edit reason] Code formatting [/edit]

internetheaven

5:33 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, thanks. I now have this:

RewriteCond %{QUERY_STRING} .
RewriteCond $1 !^folder1/
RewriteCond $1 !^folder2/
RewriteCond $1 !^folder3/
RewriteRule ^(.*)$ http://www.example.co.uk/$1? [R=301,L]

which seems to be working just fine. But it doesn't remove ? when they are on their own such as:

www.example.co.uk/?
www.example.co.uk/about.php?
www.example.co.uk/folder/?

Have I entered it wrong?
Thanks
Mike

jdMorgan

6:31 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, the "?" is neither part of the querystring or the URL. Therefore, this is harder to detect.

Also, you will need to add (and maintain) an exclusion for the MSN mobile proxy, used by cell phone and PDA users. For some reason, their proxy adds a "?" to the end of everything. Adn of course, you'll need to keep a lookout for other user-agents in the future, which may emulate this behaviour... :(


RewriteCond $1 !^(folder1¦folder2¦folder3)/
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteCond %{HTTP_USER_AGENT} !MSN\ Mobile\ Proxy
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Replace the broken pipe "¦" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Jim

londrum

6:41 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



you can do this with php as well. something like this...

if($_SERVER['QUERY_STRING']) {
$domain_name = $_SERVER['HTTP_HOST'];
if($domain_name == 'example.com') {
$domain_name = 'www.example.com';
}
$file_name = $_SERVER['SCRIPT_NAME'];
$full_url = 'http://'.$domain_name.$file_name;
header('HTTP/1.1 301 Moved Permanently');
header('Location: '.$full_url); }

g1smd

8:00 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can, but .htaccess has greater coverage, because it operates on the requested URL, not just on URLs that manage to connect through to the script.

londrum

9:03 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...handy for people who have got lousy webhosts though (like me), who don't allow you to use a .htaccess file

g1smd

10:11 pm on May 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That should be a crime, as there are just so many useful things that need to go in that file.