Welcome to WebmasterWorld Guest from 35.171.45.91

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Removing query strings from URLs

Prevent duplicate-content issues with dynamic pages

     
8:05 am on Dec 11, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


I've seen several posts recently, asking about how to prevent search engine spiders from "getting into a loop" and trying to index one URL with multiple query strings after following links which include those query strings. This could lead to duplicate-content issues, or cause the spider to "give up" on a site after finding too many URLs for what is actually the same page. The following code demonstrates a method for removing the query string by permanently redirecting spiders to the base URL without a query string.

# Redirect search engine spider requests which include a query string to same URL with blank query string
RewriteCond %{HTTP_USER_AGENT} ^FAST(-(Real)?WebCrawler/¦\ FirstPage\ retriever) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Gigabot/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot(-Image)?/[0-9]\.[0-9]{1,2} [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mediapartners-Google/[0-9]\.[0-9]{1,2} [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*(Ask\ Jeeves¦Slurp/¦ZealBot¦Zyborg/) [OR]
RewriteCond %{HTTP_USER_AGENT} ^msnbot/ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Overture-WebCrawler/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Robozilla/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^(Scooter/¦Scrubby/) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teoma
RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

  • This code intended for use in web root .htaccess. Changes to RewriteRule needed for use in httpd.conf.
  • The user-agent list is not all-inclusive, and may not be correct for your region
  • Replace all broken pipes "¦" with solid pipe characters from your keyboard
  • Replace "example.com" with your domain name
  • The "?" in the RewriteRule substitution must be included as shown
  • This could be construed as cloaking, but without intent to mislead. However, code errors could be dangerous in this regard
  • You will need to monitor your site long-term, and add new SE spider user-agents as they appear
  • A 301-Moved Permanently redirect is specified to tell the spider to replace the query-string URL it used with the base URL
  • This code is untested

    Jim

  •