Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How does 'Let Googlebot decide' for query parameter work for SEO?

         

JVerstry

6:59 am on Sep 20, 2013 (gmt 0)

10+ Year Member



Google Webmaster Tools (GWT) has a page to configure URL parameters. One of these configuration elements is about crawling, which can be set to 'Let Google Decide'.

Let's imagine a website having a page A, accepting one parameter p. The content of the page is displayed according to the parameter value. Let's imagine the category pages of this website link to that page 10.000 times, but always with a with a different and valid for value for p.

Let's imagine that page A displays really different content for 5.000 values of p, but it displays near duplicate content for the remaining 5.000 values.

A sitemap is submitted via GWT, it contains the category pages, but it does not contain 10.000 entries to A for each value of P. Google can find the 10.000 ways to access A by crawling the category pages.

My questions are:

i) How will Google index page A with the 10.000 p values (assuming the page is not made canonical)? Is it cherry picking?

ii) Could one get hit by Panda because of the 5.000 near duplicate content parameter values? Is one protected by letting Googlebot decide (i.e., Google is not forced to index those 5.000 pages)?

Does anyone have real/live/feedback experience to share?

aakk9999

9:02 am on Sep 21, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



i) Assumming your crawling budget is big enough and assuming Google can see all these 10,000 different URLs, Google will crawl all these URLs. For near-duplicates, they may be indexed, or they may be in supplemental index (what shows when you click on "Repeat the search with omitted results included" underneath some searches) or they may be filtered out of index. Hence it is called "Let Googlebot decide"

ii) Yes, you could get hit by Panda by having 5,000 near duplicate pages as a result of the same/similar content being returned on 5000 different URLs (since each different parameter value = new URL). And you are not protected by "Let Googlebot decide"

lucy24

8:00 pm on Sep 21, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let's imagine that page A displays really different content for 5.000 values of p, but it displays near duplicate content for the remaining 5.000 values.

That's too sophisticated a distinction for a robot to make. Especially if the first 5000 are genuinely different. Think of something like a CMS with rewriting disabled, so every single page comes through as "index.php?page=3qkjgfozdiu" or "index.php?page=atirujoxivj".

Safer to redirect the parameter values that don't produce completely different pages. This is most easily done within the process that builds the pages-- php or similar-- rather than with a RewriteRule or similar. But that part's a "how to" rather than a "whether to".

JVerstry

6:45 am on Sep 24, 2013 (gmt 0)

10+ Year Member



For the records:

I can confirm that "Letting GoogleBot decide" on a parameter does not protect from Panda. Apparently, Google wants "you" to deal with near duplicates or thin content in this case.