Forum Moderators: phranque

Message Too Old, No Replies

remove query string for spiders

         

movingon

12:09 am on Mar 7, 2008 (gmt 0)

10+ Year Member



How can I remove all query-strings for the spiders and 301 redirect to the page without the query-string.

For example, I have this url being indexed once for each page:

www.example.com/cat-m-77.html?page=1&sort=2a
www.example.com/cat-m-77.html?page=2&sort=2a

I also have query strings beginning with ?listing or ?filter_id, etc, etc.

I want a generic enough rule that I can put in place to catch all of them and in the example above, redirect it to:

www.example.com/cat-m-77.html

Make sense? I have this section currently in my .htaccess file that I was hoping to expand upon to include the above:
#
# Skip the next two rewriterules if NOT a spider
RewriteCond %{HTTP_USER_AGENT} !(msnbot¦slurp¦googlebot) [NC]
RewriteRule .* - [S=2]
#
# case: leading and trailing parameters
RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+&(.+)$ [NC]
RewriteRule (.*) $1?%1&%2 [R=301,L]
#
# case: leading-only, trailing-only or no additional parameters
RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+$¦^osCsid=[0-9a-z]+&?(.*)$ [NC]
RewriteRule (.*) $1?%1 [R=301,L]

Thank you for any input...

gergoe

4:40 pm on Mar 7, 2008 (gmt 0)

10+ Year Member



The rules you have removes the osCsid parameters from the query string (although I think the last one is not correct). Biut in your question you mention all query string parameters? Or you only want to remove that from the cat-m-77.html like urls? If that's so, you would need to explain how these filenames do look like, or even better, you can try turning that explanation into a regular expression as you have done with the existing rules?

g1smd

8:26 pm on Mar 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I gave up with trying these redirects on one site, and fixed it by wildcard disallowing certain parameters from indexing by using robots.txt:

User-agent: Googlebot
Disallow: *sort=
Disallow: *osCid=

and so on.

Once Google had dropped the vast majority of the "duff" URLs from their index, it was easy to then see the final few that needed to be handled by altering the site scripting.