Forum Moderators: phranque
I want to set up a command in my htaccess, where all pages ending in the following extensions:
?sort=2d&page=1
?sort=2a&page=1
?sort=3a&page=1
?sort=3d&page=1
are not indexed by spiders, but will work internally for customers on the site. This code is used to sort products by price, etc. on the same page, but i dont want spiders to index the same pages multiple times.
How can i prevent indexing of all these pages, but still allow them to work in my website?
Thanks in advance!
Jim
# Redirect search engine spider requests which include a query string to same URL with blank query string
RewriteCond %{HTTP_USER_AGENT} ^FAST(-(Real)?WebCrawler/¦\ FirstPage\ retriever) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Gigabot/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot(-Image)?/[0-9]\.[0-9]{1,2} [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mediapartners-Google/[0-9]\.[0-9]{1,2} [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*(Ask\ Jeeves¦Slurp/¦ZealBot¦Zyborg/) [OR]
RewriteCond %{HTTP_USER_AGENT} ^msnbot/ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Overture-WebCrawler/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Robozilla/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^(Scooter/¦Scrubby/) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teoma
RewriteCond %{QUERY_STRING} .
ReWriteRule .*\?(sort=2d&page=1¦sort=2a&page=1¦etc.)$
It all depends on what you want to do in this case.
Jim
...
RewriteCond %{HTTP_USER_AGENT} ^Teoma
RewriteCond %{QUERY_STRING} (sort=2d&page=1¦sort=2a&page=1¦etc.)
RewriteRule .* /noindex.html [L]
...
RewriteCond %{HTTP_USER_AGENT} ^Teoma
RewriteCond %{QUERY_STRING} (sort=2d&page=1¦sort=2a&page=1¦etc.)
RewriteRule [b]\.php$[/b] /noindex.html [L]
<html>
<head><meta name="robots" content="noindex,nofollow"></head>
<body></body>
</html>
Change all broken pipe "¦" characters to solid pipe characters before use.
Jim
To send all Google queries for sub domain "foo.foo.com" to noindex.html, is this correct, or must the "." in the domain name be escaped?
...
RewriteCond %{HTTP_USER_AGENT} ^Google
RewriteCond %{QUERY_STRING} (foo.foo.com)
RewriteRule .* /noindex.html [L]
*foo will be replaced by actual sub domain name
So would this work:
RewriteCond %{HTTP_USER_AGENT} ^(spider1¦spider2¦etc.)
RewriteCond %{QUERY_STRING} (?)
RewriteRule .* /noindex.html [L]
Also, some of the pages have already been indexed. Would this also tell google to remove the pages from their index if they already exist? Finally, could someone point me to a basic spiders list to use in the first line of code above..?
Thanks in advance - i really appreciate this forum!