Google Webmaster Tools (GWT) has a page to configure URL parameters. One of these configuration elements is about crawling, which can be set to 'Let Google Decide'.
Let's imagine a website having a page A, accepting one parameter p. The content of the page is displayed according to the parameter value. Let's imagine the category pages of this website link to that page 10.000 times, but always with a with a different and valid for value for p.
Let's imagine that page A displays really different content for 5.000 values of p, but it displays near duplicate content for the remaining 5.000 values.
A sitemap is submitted via GWT, it contains the category pages, but it does not contain 10.000 entries to A for each value of P. Google can find the 10.000 ways to access A by crawling the category pages.
My questions are:
i) How will Google index page A with the 10.000 p values (assuming the page is not made canonical)? Is it cherry picking?
ii) Could one get hit by Panda because of the 5.000 near duplicate content parameter values? Is one protected by letting Googlebot decide (i.e., Google is not forced to index those 5.000 pages)?
Does anyone have real/live/feedback experience to share?