|Indexing vs Canonical vs Self-Removal|
I have a few questions as to which method should be used in certain scenarios for a new site. I'm confused because it seems that with this Panda update I should not have shallow content.
1. I have a status page which displays a short status for each event, like twitter. It doesn't display any more information other than what is displayed on the main events page. Should this status page have:
<meta name="robots" content="NOINDEX, FOLLOW" />
Or should I get rid of the individual status page completely? The reason it's there is if someone wants to link to this event specifically.
2. I have a main object page which displays summary information about the object, as well as the latest 3 events related to this object. I also have an associated object activity page(s) which displays all events related to this object which is paginated if over 10 events.
How should I treat these 'activity' pages?
a. for all activity pages: <meta name="robots" content="NOINDEX, FOLLOW" />
b. for all activity pages except page 1: <meta name="robots" content="NOINDEX, FOLLOW" />
c. allow indexing for all activity pages
3. I have a object table details page which can be sorted by column. For example
object.php (default sort)
How should I treat the pages that have a sort variable?
a. <meta name="robots" content="NOINDEX, FOLLOW" />
b. canonical pointing back to object.php
c. allow indexing for all sort variables and add sort parameter to <title> and <desc>
I don't think Google is against content less than X-amount of characters per-se, but is instead concerned with what your visitors think of that content. Visitors don't like "thin" content, generally speaking, and they can tell Google they don't like it in various ways. But if it's short content that serves a really good purpose, and you are confident that your users like it, that's not necessarily a bad thing. For instance, I don't see Twitter disallowing or blocking status update pages. This is just my opinion; others my differ.
Nevertheless, it is best practice to noindex, follow those types of pages in much the same way that it is best practice to noindex, follow tag pages on blogs. But I wouldn't get rid of them completely if you know that users are linking to them. Are they? Every situation is different and without knowing more about your site that's all I can say about #1.
2. I generally noindex, follow paginated pages (except the first page, which I make sure has rel canonical if it is also just "/" as well as /Page1/). It's tough for me to grasp exactly what those pages represent on your site, but assuming it's like a blog category archive page or an ecommerce category page, I choose to noindex, follow them instead of leaving them all indexable these days. The exception that I've seen work sometimes is if you put some static content on just page 1, but take it off of the subsequent paginated pages. You'd also want page 1's Title and Description to be static, while you could automate the rest. But this would leave your paginated pages thin and the content all automated, as well as having very similar titles and descriptions. In a post Panda world I'd lean toward noindex, follow.
3. I usually tell Google and Bing to ignore sort parameters, and use the rel canonical tag pointing to, in your case, object.php. Option A would be the next best thing to option B, and option C would be a bad idea these days.
Alternatively, I've seen sites handle sorting in creative ways lately, including ajax, jquery and by rearranging the divs with on-click events so that you can sort without changing the URL, or in some cases, only changing the URL to a named-anchor link (e.g. page/#C) to jump down to the C section.
Others may have more / better advice but that's my thinking on the subject. I hope it helps.