|Paginated result sets with tags|
I have a site that was recently impacted by the farmer update. It basically contains paginated result sets. Now, the entries in a page is tagged which makes it appear in some other page...
Page A has results 1, 2, 3, 4 .... and has pagination with 10 rows in a page.
Now, Result set 1 can have a tag to Page XY, which in turn has results like for eg: 93, 94, 95, 1, 34, 56 etc...
I hope you understand what i am trying to say here. The tags are important, because the result is related to the tags and should be displayed.
Since the results appear in multiple pages, I believe Google believes it as a duplicate content. I think it also considers as a low quality site.
When I look up site:mysite.com in Google, all i get is the main level pages Page A, B, C... and not Page A1, A2. etc.. Page A1, A2, A3 are in the index when i look for the complete url. This obviously means that Google doesn't think A1, A2, A3 are as important as A since they have mostly the same Description and Keywords, except with words "Page 2" embedded in it.
Now, my question is, should i use noindex for the "rest of the pages", so that Google ignores the paginated results and only takes the main page. Will this help in reducing the "low quality" parameter set by Google for the site?
Sorry if the post is not clear, but I don't think I can explain it in simpler terms without any confusion.
In my experience, pagination is not worth it from either a usability or a SEO point of view. We have lots of data on our site that was at one point paginated. We also have functionality for users to search and filter the data.
Analysis showed that 1% of users used the pagination and Google sent 99% of traffic to page 1. Given that, it was an easy decision to remove pages 2 through N entirely. We made sure that we redirected the removed pages back to page 1. We also made sure that all the content listed on the removed pages was accessible to Googlebot through sitemaps.
We saw no impact on traffic when we did this, and that site was not hit by the farmer update.
I'm on the anti-pagination side myself. Any site where I have to use pagination (like category pages with lots of items) I either start making subcats or else block all the page= references in robots.txt. More trouble than it's worth from a management point of view (and I also hate it from a user point of view)
|(and I also hate it from a user point of view) |
I would agree with netmeg 100% on this.
From a user's point of view, I would rather see smaller, more concise categories.
But this is just one man's opinion and it is completely from a user's point of view. I have no idea what the SEO ramifications are.
@deadsea - What about your pages being accessible to users who are navigating your site, or getting adequate internal link juice? You can't do that with a sitemap.
@castor_t - It definitely sounds plausible that tags are giving you dupe content issues. As long as you've got a good information architecture otherwise, I'd just noindex all the tag pages, even the first ones. The problem is more likely having the same entries showing up for multiple tags and the first page causes that too.
@DanAbbamont - Filters (eg, by brand, price, or feature) are far better for users than pagination. Filters powered by AJAX that don't create new SEO urls work great at making this content accessible to users without cluttering up the site map with combinations that have no search volume.
As far as adequate link juice, it is far more efficient to directly link content together rather than rely on category pagination. Sites that I have worked on use "nearby", "similar", "same brand", "recent", "users who liked this also like", and just plain random.
|Filters powered by AJAX that don't create new SEO urls work great at making this content accessible to users without cluttering up the site map with combinations that have no search volume. |
Was going to suggest AJAX and you have options, crawlable page state change (#!) ( [code.google.com...] ) or just include the info with JS and don't even let SEs know it's there. Just move the content 'sports' to separate pages and switch the content on a click, rather than dumping it all on one page if that's what you currently do.
Ok, I get that all of you here are against pagination. Well, I can't remove pagination now because of the site architecture and some other reasons.
So, I am thinking of using a noindex tag for the rest of the pages, so that Google will ignore them. But I am not sure noindex will help, as I believe Google still reads those pages and will find the duplicate content that is spread across several tags.
The reason I am thinking of this tag thing is because I am hit by the recent farmer update and I believe it's because of the duplicate content across tags.
So, does adding "noindex" help my cause?
|I believe Google still reads those pages |
Google must crawl those pages or they'll never see the noindex meta tag. You can certainly try the experiment and find out if it helps in your case, The update is too new for any tested advice to be available.
I am trying to understand the purpose of using a "noindex" tag. Is it useful only not to display the page in the search results?
Since google reads those pages, I think they will analyze the content in those pages with their algorithms to rank the site. So, i think noindex, doesn't help with the duplicate tag issue, right?
I am experimenting using the noindex tag anyway, but I just want to understand the concept.
|Is it useful only not to display the page in the search results? |
Because the page is no longer even a candidate to rank, it can't be creating a duplicate content conflict in the SERP.
It's really an interesting question ... I've started to reply to this thread 3 times or so since my previous post and either just clicked back or deleted the post before I submitted it.
IMO noindex isn't the answer any more, but only time and testing will tell.
Please let us know your results.
I really think people are going to have to find a way to remove the content rather than simply taking it out of the results ... I'm not sure I had completely thought it through when I posted before, but my guess now is noindex will not do much for you.
The question I keep asking myself is: If your site wasn't high enough quality before, because of duplicate content, why would I want to send a visitor there to stumble through that duplicate content rather than sending them somewhere the content is 'higher quality' site-wide? Okay, so you took it out of the index and I won't send people directly to it, but how does removing it from the index increase the overall quality of your site?
TheMadScientist, that's interesting. I guess all it takes one line of code and the noindex could hurt us. Or the blocked for robots. So many things that made sense now are up in the air.
Google, comment on this please. Should we delete completely or s noindex just as good? Mind you that some pages are necessary for the users but now cause problems with Google.
My thoughts have been wandering down those same channels - but as you say, it's all new and untested.
|Google, comment on this please. Should we delete completely or s noindex just as good? |
I think they may already have:
(Unofficially, of course, but it's likely as good as we're going to get.)
|... For this reason, if you believe you've been impacted by this change you should evaluate all the content on your site and do your best to improve the overall quality of the pages on your domain. Removing low quality pages or moving them to a different domain could help your rankings for the higher quality content. |
[seoholic AKA wysz AKA Google Employee]
Page 14 @ 20 Posts Per Page
Though tags may create duplicate content issue, the content is applicable to all the tags.
For example: A property of an apple applies both to "red apple" and "green apple". And it should be present in both red apple and green apple pages, because it's important.
In this case, I don't think Google should put a penalty.
Please remember that I am talking about result sets here and not articles. The content is mixed with some other unique content on the tag page.
== some unique content ==
== Apple is round in shape ==
== Apple is a fruit ==
== some unique content ==
== some unique content ==
== Apple is round in shape ==
== Apple is a fruit ==
== some unique content ==
This may be a lame example, but it conveys the point I am trying to make.
Like I said, it's an interesting question, because changing whether the pages are indexed or not has no impact I can see on the overall quality of your site...
Impact on the quality of Google's index?
Sure it could have an impact there, but the more I think about it and the more I think about what I would do if I had to try to return 'quality' results, as far as your site goes, changing the indexed pages doesn't seem like it would have an impact on the quality to me, because I wouldn't change the score of your site just because I didn't send visitors directly to your 'low quality' pages.
I get your point, but I'll be surprised if noindex helps...
Think about a directory review for a minute (a big one, like Yahoo!) ... It doesn't really matter too much if all of your pages are indexed in a search engine, they still look at the pages and see what works and what doesn't and if your site is 'high enough quality' (there's that pesky work again) to be included ... IMO that's what Google is doing algorithmically.
It's not really a matter of whether or not it makes sense for the user, it's just the fact that when you use this kind of IA, you can end up indexing tons and tons of pages that aren't good for your SEO. So you need a solution, and noindex nofollow should work fine.
|Think about a directory review for a minute (a big one, like Yahoo!) ... It doesn't really matter too much if all of your pages are indexed in a search engine, they still look at the pages and see what works and what doesn't and if your site is 'high enough quality' (there's that pesky work again) to be included ... IMO that's what Google is doing algorithmically. |
Well, they really have to make them supplemental. To a bot, this looks just like a site full of pages of snippets randomly assembled to pass as content. Those were a real problem back when I was exploiting them. If you don't noindex, nofollow them you've got your link juice flowing through a ton of pages that could never rank and you're weakening your entire site.
I am with the TheMadScientist on noindex.
I can however see how the "junk factor" of a site might be migitigated by using noindex. If google has deemed a particular page to be low quality it might not count as heavily against you if it is noindexed.
I have a lot of pages on my site which are only really of interest to the individual who created them and I have always noindexed them. I think I would be wise to move them to another domain.
Do you think moving these pages to a subdomain will suffice ?
btw I am in the UK with mainly UK visitors so have not been impacted one way or the other by farmer yet.
If I were unable to remove pagination, I would use the rel canonical tag to point pages 2 through N back to page 1 rather than try to noindex them.
If you dont put page # then it would be duplicate Title Tags.
My website are always paginated.
It seems Google is agains't using Canonical tags for paginated results.
The following snippet is from [seroundtable.com...]
Some webmasters may decide to use the canonical tag to communicate to Google to redirect pages 2, 3, 4, and 5 to page 1. But technically, as per Maile from Google in the panel last night, that is wrong and should not be done.
Maile explained that since the results on pages 2, 3, 4, and 5 are different from page 1, you should not use the canonical tag here.
Not only that, if you do, Google may ignore it because Google uses methods to determine if the canonical tag command is actually something valid for that case. So if you canonical page 2 to page 1 and page 2 is not similar enough to page 1, Google may ignore your canonical tag.
Oh.this is an interesting discussion. noindex is unavoidable in certain CMS platforms.google cannot expect people to completely remove duplicate content.For example, category or tag pages.Likewise, pages where we paginate comments.
In the above situations, noindex was the only way to tell google that a particular page is not to be considered for ranking.
In the case of pages with paginated comments we can use the canonical tag, but not in the case of tag pages or category pages.We just cannot remove them completely. For paginated comments, I preferred the noindex option. If google is changing something here, they should inform the webmasters.
Yeah, this is really interesting, and I keep coming back to 'if I was building a search engine...', so here's the new / next question:
I'm fairly sure I'll get a bunch of 'but they can't' or 'that's not fair' or 'but the little guy' comments for this line of thinking, but as tedster keeps saying they're not in the business of keeping you in business or being fair ... They're in the business of staying in business themselves and retaining or growing market share, not kissing webmaster's a** because it's the 'nice' thing to do, or even rewarding webmasters because they work really hard and tried their best.
People keep saying 'Google can't expect everyone to...' (or things to that effect) and I keep thinking, a sign of quality to me might be is your site 'custom' or 'off the shelf' ... If it's custom you don't "have to" do anything or leave anything the way it is ... IOW you can change anything ... but if it's 'off the shelf' then unless you go to the trouble of modifying it (making it custom) you can't do it.
Personally, I think that could say quite a bit to me about the 'starting point' quality of your site and I might well use it for scoring purposes ... Here's why: Is it more likely a scraper, spammer, spinner is going to go build a completely custom site to put their garbage on, or is it more likely they are going to use a 'shelf version' of some software so they can rinse and repeat with little or no expense if their site tanks?
Google can't expect everyone to (blah here) ... You're right, and I don't think they do expect everyone to ... In fact I think they expect most people won't, but the 'higher quality' sites will, which makes ranking results much easier.
NOTE: indyank I'm really not trying to pick on you at all, just using your comment in this thread as an example, because it's the latest in a series of them that keep pushing me down a road of thought that isn't very pretty for the 'little guy' who may well be in over their head when it comes to long-term business viability based on Google rankings.
Pagination has it's benefits, if your internal link structure is designed to maximize on it while keeping it shallow.
Quote from Matt Cutts:
|If itís possible to keep things relatively shallow in terms of intermediate pages, that can be a good practice. If someone has to click through seven layers of faceted navigation to find a single product, they might lose their patience. It is also weird on the search engine side if we have to click through seven or eight layers of intermediate faceted navigation before we get to a product. In some sense, thatís a lot of clicks, and a lot of PageRank that is used up on these intermediate pages with no specific products that people can buy. Each of those clicks is an opportunity for a small percentage of the PageRank to dissipate. |
In the aftermath of Panda, I've been taking a good look at my pagination the last couple of days. I get a fair amount of search engine landing page traffic (about 5%) to my paginated results beyond the first page, so there is some value to these pages, but in my effort to make my site leaner to get me out of Panda, I'm planning to do the following:
- Noindex, Follow all pages beyond page 1
- Add a sortable function(noindex), by default, have the products posted by newest post date
I'm hoping my first page will carry more relevance and make up for the lost traffic from noindexing the other pages.