Forum Moderators: phranque
I would like to use rel="nofollow" so Googlebot doesn't visit all these filterID-variations of the same page. I can't find any information on the subject anywere. Can this be done?
<form name="filter" action="http://www.mysite.com/products.php" method="get">
Product filter:
<select name="filterID" onchange="this.form.submit()">
<option value="" SELECTED>All Products</option>
<option value="1">Car</option>
<option value="2">Space rockets</option>
<option value="3">Bikes</option>
</select>
</form>
(a) Were there EVER plain HTML links to those pages?
(b) Could other people have placed HTML links from other sites (Check Yahoo! - Google may not reveal their sources - except in webmaster tools).
[edited by: Quadrille at 11:08 am (utc) on Jan. 26, 2008]
Quadrille, I've completely rewritten the url structure from long .php uri to very short .htm uri, and made 301 redirects from the old website (that ALSO cut off all?variables). So my answer would be: 1) No, 2) No (or less than 5).
My problem is on the new uri's, and not the old uri's, and I don't redirect to the new one's with those?variables, so yes, I think Googlebot does in fact index javascript links.
If those pages had even one link from another site, then the lack of html links from your site would not stand in the way of Google listing them.
I do not think the javascript links are followed by Google.
And I'd be very surprised (and very interested), if anyone can say different, with any authority.
<?php
if ( empty( $_GET['filterID'] ) ) echo "<meta name=\"robots\" content=\"index,follow,noarchive\">\n";
else echo "<meta name=\"robots\" content=\"noindex,follow\">\n";
?>
This doesn't stop Googlebot from loading your page with the filter parameter, but it instructs it to not use it for indexing. Existing versions with the filterID parameter will fall out of the SERPs in weeks or a few months is my experience, depending of the crawl rate on your site.
There is a section about disabling access to URLs with a question mark which may fit your case. The following should work according to the helptext:
User-agent: Googlebot
Disallow: /*?
The following will also work if you only want to block the filter variants of products.php and are not interested in other .php files with or without parameters. This is standard robots.txt syntax and will therefore also work for other search engines.
User-agent: Googlebot
Disallow: /products.php?
Do some testing before you go live with it, as you might accidentily block access to more content than you intented.