How to block googlebot from spidering javascript dropdown

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to block googlebot from spidering javascript dropdown

netmasters

3:35 pm on Oct 30, 2008 (gmt 0)

I've discovered an interesting issue with product pages on a client's site. Google Webmaster Tools is reporting a ton pages with duplicate meta descriptions. I took a look and discovered that the product pages on the site all have a javascript currency converter. It's in the form of a drop down with various currencies. When you select one, a new URL is generated with the currency.

It goes something like this. The original URL is

example.com/example/ab.html

The URL generated for the EURO is

example.com/exampleproduct.asp?example=example&id=ab&AB1=EUR

I tried blocking in robots.txt with

Disallow: /exampleproduct.asp

But no luck. Google is still finding all of those currency pages and reporting them as duplicates. There are 20 currencies on a site with over 100,000 products, so do the math.

Any suggestions?

[edited by: Receptional_Andy at 3:42 pm (utc) on Oct. 30, 2008]
[edit reason] Exemplified URLs [/edit]

Receptional Andy

3:43 pm on Oct 30, 2008 (gmt 0)

The robots.txt entry is correct and should prevent the indexing of those URLs. How long ago did you add it?

Additionally, you could use the robots checker within Webmaster Tools to verify that Google has a current copy, and that is blocked from accessing those URLs.

netmasters

4:01 pm on Oct 30, 2008 (gmt 0)

The block in robots.txt was added a couple of months ago, but somehow Google is still finding those currency pages.

I checked Webmaster Tools prior to making this post. Everything looks like it should be working. It says the URLs are blocked, yet Webmaster Tools still reports a zillion currency URLs as being duplicates as far as meta descriptions. That tells me that Google is still seeing those pages.

acemi

5:12 pm on Oct 30, 2008 (gmt 0)

Matt Cutts did a post on selectively blocking dynamic URLs [mattcutts.com] from Google (depending on string or pattern). I have tested and used this on a forum and it works well.

Receptional Andy

5:51 pm on Oct 30, 2008 (gmt 0)

I don't think this is an issue related to the correct syntax for robots exclusion - the correct syntax is in use, as confirmed by WMT.

I think what's happening is as a result of different things happening at different times, especially as the content may have been spidered as a result of google's more "creative" crawling processes [webmasterworld.com] - meaning there are no actual links to the URLs themselves.

The WMT reporting of duplicates operates on content within the index - regardless of whether that content is due to be excluded. It doesn't "know" that you've added lines in robots.txt. I suspect that because there are no links to the content, it will take Google much longer to get rid of the URLs, and even then, many of them will remain in the index as "URL only" listing, perhaps indefinitely. So, I think it's a waiting game.

The other consideration is that the WMT error doesn't necessarily matter - it doesn't indicate content that cannot perform in search results, nor does it imply that there will be a negative effect on site performance. I think it's safe to ignore the warning while waiting for the content to disappear. The only other possibility is that Google is not correctly interpreting the robots disallow, or is not correctly applying the rules to content discovered by methods other than standard links, but personally I don't believe that to be the case.

netmasters

6:11 pm on Oct 30, 2008 (gmt 0)

Thanks. I don't really know if the duplicate meta descriptions being reported by Webmaster Tools due to the currency conversion URLs are even an issue. You would think that if the URLS are blocked by a robots.txt that WMT obviously can read that Google would be smart enough to not fill up the diagnostics report with junk that means nothing.

I've been watching these for several months, hoping that they would fall out, but so far, like visiting relatives, they don't know that it's time to leave!

And, yep, these issues starting popping up about the time Google got "creative" in its crawling.

Receptional Andy

6:20 pm on Oct 30, 2008 (gmt 0)

[quote]You would think that if the URLS are blocked by a robots.txt that WMT obviously can read that Google would be smart enough to not fill up the diagnostics report with junk that means nothing[.quote]

WMT has lots of issues - some because it's considered an authoritative resource for Google information, and perhaps expectations are too high. But the duplicate detection is a very simple stat, and nothing like the process that goes into evaluation of pages for relevance in SERPs. Not that it isn't useful and usually worth fixing, but it doesn't necessarily have any real world impact.

And yep, it can take a long time for certain types of content to drop out. The only way to speed that up would be URL-removal, but I'm no fan of that and don't believe it would actually do anything useful other than tidy up site: search serps anyhow - I doubt you're getting any visitors landing on these pages.