Forum Moderators: phranque

Message Too Old, No Replies

rel="nofollow" and drop-down forms

Using rel="nofollow" to stop googlebot indexing dropdown-java-form links?

         

RewriteEngine

2:19 pm on Jan 25, 2008 (gmt 0)

10+ Year Member



I'm using a drop-down product filter on my website that adds a filter id to the url, so mysite.com/products.php becomes mysite.com/products.php?filterID=1 etc.

I would like to use rel="nofollow" so Googlebot doesn't visit all these filterID-variations of the same page. I can't find any information on the subject anywere. Can this be done?


<form name="filter" action="http://www.mysite.com/products.php" method="get">
Product filter:&nbsp;
<select name="filterID" onchange="this.form.submit()">
<option value="" SELECTED>All Products</option>
<option value="1">Car</option>
<option value="2">Space rockets</option>
<option value="3">Bikes</option>
</select>
</form>

Quadrille

2:30 pm on Jan 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't think Google will follow such links anyway, so I don't think you need to worry.

RewriteEngine

3:21 pm on Jan 25, 2008 (gmt 0)

10+ Year Member



I have a lot of hits on these pages from Googlebot. So my guess is that Google does in fact follow such links.

I would be surprised if it only got these urls from normal users using Google Toolbar and not at least some of them because googlebot follow such links.

Quadrille

3:44 pm on Jan 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are no other links to those pages?

RewriteEngine

8:31 am on Jan 26, 2008 (gmt 0)

10+ Year Member



No

Quadrille

10:44 am on Jan 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry to be a bore on this - but I've waited years for proof that Google could and would follow and index via javascript links.

(a) Were there EVER plain HTML links to those pages?

(b) Could other people have placed HTML links from other sites (Check Yahoo! - Google may not reveal their sources - except in webmaster tools).

[edited by: Quadrille at 11:08 am (utc) on Jan. 26, 2008]

RewriteEngine

12:33 pm on Jan 27, 2008 (gmt 0)

10+ Year Member



I hope we can back to my question ;-) Anyone?

Quadrille, I've completely rewritten the url structure from long .php uri to very short .htm uri, and made 301 redirects from the old website (that ALSO cut off all?variables). So my answer would be: 1) No, 2) No (or less than 5).

My problem is on the new uri's, and not the old uri's, and I don't redirect to the new one's with those?variables, so yes, I think Googlebot does in fact index javascript links.

Quadrille

3:27 pm on Jan 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We never left your question. :)

If those pages had even one link from another site, then the lack of html links from your site would not stand in the way of Google listing them.

I do not think the javascript links are followed by Google.

And I'd be very surprised (and very interested), if anyone can say different, with any authority.

lammert

1:17 am on Jan 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't know if your solution with rel="nofollow" will work, but on my site I have a comparable problem that I solved with dynamic robot meta tags. I think the following works for you if you put it in the header of products.php where the <head> section of the HTML is generated:

<?php
if ( empty( $_GET['filterID'] ) ) echo "<meta name=\"robots\" content=\"index,follow,noarchive\">\n";
else echo "<meta name=\"robots\" content=\"noindex,follow\">\n";
?>

This doesn't stop Googlebot from loading your page with the filter parameter, but it instructs it to not use it for indexing. Existing versions with the filterID parameter will fall out of the SERPs in weeks or a few months is my experience, depending of the crawl rate on your site.

RewriteEngine

10:45 pm on Jan 30, 2008 (gmt 0)

10+ Year Member



clever solution lammert. My concern is all the pages that are loaded. I have about 500 pages with up to 15 filterID's, so it adds up to a lot of pages. I would have preferred that googlebot wasn't hammering my website with these requests in the first place, but I guess it can't be done with the nofollow tag.

lammert

1:08 am on Feb 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You may also have a look at the wildcard extensions of robots.txt for Googlebot. More info about this can be found here [google.com] in Google's webmaster help center.

There is a section about disabling access to URLs with a question mark which may fit your case. The following should work according to the helptext:

User-agent: Googlebot
Disallow: /*?

The following will also work if you only want to block the filter variants of products.php and are not interested in other .php files with or without parameters. This is standard robots.txt syntax and will therefore also work for other search engines.

User-agent: Googlebot
Disallow: /products.php?

Do some testing before you go live with it, as you might accidentily block access to more content than you intented.