Forum Moderators: phranque
www.example.com/index.php?noio_ino
www.example.com/?iie9n4&dd4
www.example.com/folder/?ofnfe
www.example.com/folder/file.php?id=on772
I don't use query strings on my site. All these pages have been indexed by Google and are therefore duplicate content.
Can I set up the htaccess file to 301 any URL with a query string to the URL without the query string?
Thanks
Mike
You will probably need to adjust the condition to check for any 'not-empty' query string.
RewriteCond %{QUERY_STRING} .
If you have other redirects, you will probably gain some efficiency by appending a blank QUERY_STRING to a redirect which will already occur. If you use canonicalization, you could probably add:
RewriteCond %{QUERY_STRING} . [OR]
preceding your %{HTTP_HOST} check and use the single rule for both.
RewriteCond %{QUERY_STRING} .
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]
which seems to be going the job.
If I were to add any scripting (e.g. a forum) to the site at a later date, is there anyway to make this rewrite not strip the query strings for JUST THAT folder?
Thanks
Mike
I now only allow certain query string formats to hit the server. Incorrect formats are stripped.
RewriteCond %{QUERY_STRING} !^(prod¦code)=[0-9]{5}(&(page¦ident)=[0-9]{2})?$
RewriteCond %{REQUEST_URI} !(products.html¦store\/?)$
RewriteCond %{THE_REQUEST} \?.*\ HTTP/ [NC]
RewriteRule (.*)$ http[i][/i]://www.domain.eu/$1? [R=301,L] There are still a small number of incorrect URLs that can get through, but the PHP script checks those and throws a 404 error for those anyway.
By pre-processing the query string in the .htaccess file, I get better coverage and give the PHP script less work to do.
Extra parameters, wrong parameters, wrong parameter values (too many, or too few digits), wrong parameter order, and so on, are all dumped by that .htaccess processing.
[edited by: jdMorgan at 10:54 pm (utc) on May 17, 2008]
[edit reason] Code formatting [/edit]
RewriteCond %{QUERY_STRING} .
RewriteCond $1 !^folder1/
RewriteCond $1 !^folder2/
RewriteCond $1 !^folder3/
RewriteRule ^(.*)$ http://www.example.co.uk/$1? [R=301,L]
which seems to be working just fine. But it doesn't remove ? when they are on their own such as:
www.example.co.uk/?
www.example.co.uk/about.php?
www.example.co.uk/folder/?
Have I entered it wrong?
Thanks
Mike
Also, you will need to add (and maintain) an exclusion for the MSN mobile proxy, used by cell phone and PDA users. For some reason, their proxy adds a "?" to the end of everything. Adn of course, you'll need to keep a lookout for other user-agents in the future, which may emulate this behaviour... :(
RewriteCond $1 !^(folder1¦folder2¦folder3)/
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^?]*\?
RewriteCond %{HTTP_USER_AGENT} !MSN\ Mobile\ Proxy
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
Jim
if($_SERVER['QUERY_STRING']) {
$domain_name = $_SERVER['HTTP_HOST'];
if($domain_name == 'example.com') {
$domain_name = 'www.example.com';
}
$file_name = $_SERVER['SCRIPT_NAME'];
$full_url = 'http://'.$domain_name.$file_name;
header('HTTP/1.1 301 Moved Permanently');
header('Location: '.$full_url); }