Canonical URL tags and INDEX on Pagination?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Canonical URL tags and INDEX on Pagination?

v01and

11:11 pm on Jun 1, 2011 (gmt 0)

Hey all,

With a site for local information I have page URLs of format /city-state/ and all following are /city-state/?page=2 etc..

Right now those pages have canonical URL back to /city-state/ and have the content="INDEX" tag.

Is this correct? Or should I NOINDEX those pages?

I am seeing many variants of same page being indexed in googles search and worry about duplicate content (or is Canonical enough).

Thank you very much

netmeg

12:03 am on Jun 2, 2011 (gmt 0)

I usually block pagination in robots.txt

deadsea

5:49 pm on Jun 3, 2011 (gmt 0)

I would get rid of pagination entirely. Users don't use it. Search engines don't like it. Replace it with filters and search for users. Replace it with "related cities" links on each city for search engines.

v01and

6:13 pm on Jun 3, 2011 (gmt 0)

That's not really an option on my site. Think of killing pagination on a site like Yelp.

deadsea

6:27 pm on Jun 3, 2011 (gmt 0)

Think of killing pagination on a site like Yelp.

Been there. Done that. Pagination is not needed, even on a site the size of yelp.

tedster

12:35 am on Jun 4, 2011 (gmt 0)

From what I've heard, a canonical link element to page 1 is enough. However in practice, I've done what netmeg does - blocked the deeper pagination with robots.txt. Why even put the bots through the exercise, I say.

Of course, this assumes that you do have another click path to products that tend to show up only on deep pages.

aakk9999

12:51 am on Jun 4, 2011 (gmt 0)

Of course, this assumes that you do have another click path to products that tend to show up only on deep pages.

I think this is a very important point to make for both, blocking the pagination with robots.txt or using the canonical link element.

If there is no another clickpath to products on deep pages then I suggest to noindex the pagination pages.

<added>
However, if the pagination is for multi-facetet search results or sort results, then blocking these pages via robots.txt or setting up canonical link element is fine.
</added>

suggy

10:53 am on Jun 4, 2011 (gmt 0)

You've surely got to noindex, if you're telling google with a canonical that "this isn't the best url for this page"? Otherwiss, you're sending out mixed messages, ie "this isn't the best url for this page, but index it anyway" -- what's the point of that?! Noindex, follow.

aakk9999

11:49 am on Jun 4, 2011 (gmt 0)

I am not sure if the page that has canonical link element pointing to another page would have links on its page followed. From what I have experienced, Google treats canonical link element as a "very strong hint", in fact, almost like a "silent 301".

It in fact even reports "Redirect error" in WMT in cases where there is a canonical link element on the page (that points to another page) and where this another page then 301 redirected to the first page.

E.g. I have seen WMT reporting redirect error in the case where:

Page A has canonical --> Page B
Page B has 301 --> Page A

My view is that if you set up the canonical link element to point to another page, you do not need to set "noindex" originating page.

But if you want links to be followed on the page, but you do not want the page in the index, then using "noindex,follow" would be safer since I am not sure the links would be followed if there is a canonical link element set instead of "noindex".

v01and

4:35 pm on Jun 6, 2011 (gmt 0)

However, if the pagination is for multi-facetet search results or sort results, then blocking these pages via robots.txt or setting up canonical link element is fine.

That is what I have. A variety of pages created through pagination and filtering of results. I do have all the filter parameters on "ignore" in Webmaster Tools Paramater Handling options.

--

Thanks for all the replies. I think I will continue supplying the canonical tag and as was mentioned it doesseem like a very strong signal for Google.

indyank

5:03 pm on Jun 6, 2011 (gmt 0)

That is what I have. A variety of pages created through pagination and filtering of results. I do have all the filter parameters on "ignore" in Webmaster Tools Paramater Handling options.

Parameter handling is not equivalent to a robots.txt block. Moreover, it is only a hint to google.

Since the content of the two pages are different, I wouldn't prefer using a canonical tag referring the main page. canonical tag is meant for handling duplicates and in your case they are not.

I don't see a problem in using "noindex, follow" robots meta tag and it would be my way of dealing with pagination.

A robots.txt block doesn't theoretically let google see the content of the page and can work equally well.But it will result in URL only listings in SERPS.

indyank

5:31 pm on Jun 6, 2011 (gmt 0)

I will have to add one more thing here.Some recent statements from googlers suggest not doing anything (neither block via robots.txt nor use noindex) and let Google figure out everything.

Have heard this from both Matt Cutts and John Mu. But this might change when they feel bored :)

deadsea

5:37 pm on Jun 6, 2011 (gmt 0)

When we removed pagination, we asked how useful it was before we did so. In our case, page 1 had a next link to page 2, page 2 had a next link to page 3, and so on. We asked the following questions and collected the following data.

How many users interact with pagination?
page 1: 100% of users
page 2: 1% of users
page 3: 0.01% of users
Clearly pagination was not a feature that our users found useful.

How many search engine referrals come in to paginated pages?
page 1: 99.5% of referrals
page 2: 0.3% of referrals
page 3: 0.1% of referrals
Clearly pagination was not driving traffic directly.

How much pagerank is pagination passing? We looked at the Googlebot crawl rate to answer this question. Googlebot crawls pages with more pagerank more often.
page 1: 95% of googlebot crawling
page 2: 4% of googlebot crawling
page 3: .05% of googlebot crawling
Clearly pagerank was getting lost. Anything linked off of page two would be getting a very small amount of pagerank. Anything linked off page 3 or lower would be getting a minuscule amount of pagerank.

Based on this information, we killed off pagination. We 301 redirected page 2+ back to the first page. We saw no ill effects.

v01and

5:40 pm on Jun 6, 2011 (gmt 0)

Unfortunately letting Google "figure everything out" dropped my organic traffic by 40% after Panda, a site with absolutely no intentional gray/black-hat stuff going on :)

indyank

5:41 pm on Jun 6, 2011 (gmt 0)

deadsea, I think OP's context is different.He doesn't want the SE to rank his paginates pages.

The example scenario is something equivalent to a category that can have several products falling under it. 301 redirecting page 2+ back to the first page isn't a solution at all.

deadsea

5:47 pm on Jun 6, 2011 (gmt 0)

We had pagination primarily to get pagerank into our entire product list as well. It turns out it doesn't work well like that. Anything one page 2+ of our category pages was getting only a very small amount of pagerank from the category page. Most of the pagerank to such products was coming from the "related products" section of other product pages.

I'm also trying to show that this is a low risk move. You can make the same measurements that I made on your own site. If your site is different than mine, then don't remove the pagination. But if your site has similar measurements, then it won't hurt you to remove the pagination.

v01and

5:57 pm on Jun 6, 2011 (gmt 0)

Sorry to be doing this so far down the thread, but here's a little bit more details.

Imagine a site that has 100,000 restaurant listings in the US.
Each restaurant has it's own page
Users can come to the site and search for "Oakland" and will get all restaurants in Oakland (say 45 of them)
Every Search Results page displays 15 restaurants (so 3 pages)
Also user can search for "Oakland Restaurants with Parking"
This results in 30 restaurants (subset of 45) on their own unique URIs of 2 search result pages. These pages have canonical to the canonical of just oakland.

/oakland-restaurants/
/oakland-restaurants/?page=2 << Canonical to /oakland-restaurants/ and INDEX tag
/oakland-restaurants/?parking&page2 << Canonical to /oakland-restaurants/ and INDEX tag

It has been observed that Google indexed all 3 pages. Is this a bad thing? Especially since the last URL can have content identical to the canonical.

Hope this clears the case up a bit.

aakk9999

6:28 pm on Jun 6, 2011 (gmt 0)

I would:

For the /oakland-restaurants (the first listing of pagination) would be my "category page" for restaurants in Oakland.

/oakland-restaurants/?page=2 (and onwards) would be noindex, to allow Google to reach and index pages of these restaurants on page 2 onwards

/oakland-restaurants/?parking&page2 would not be exposed to search engines if at all possible (e.g. URL created server side when the on-site <search> is clicked upon) OR if it is exposed, then I would stop these pages via robots.txt

UNLESS

- you think it is a good idea to try to rank for "Oakland restaurants with parking" in which case the FIRST page of this search would be allowed, the URL would be something alongside /oakland-restaurants-parking or similar AND, if most Oakland restaurants have parking, then you should ensure that the list of restaurants displayed there does not duplicate list of restaurants on /oakland-restaurants page (by perhaps using a different sort criteria for restaurants with parking and adding some text about how easy/difficult is to find parking in oakland above or below the list of restaurants).

In that way the list of restaurants on the first page "with parking" would be a different to the list of "all restaurants in oakland". In this case I would block by robots.txt subsequent listing pages for "with parking" variant since each restaurant page can be accessed via "all restaurants in oakland" pagination where page 2 onwards is noindex.

indyank

3:14 am on Jun 7, 2011 (gmt 0)

/oakland-restaurants/?parking&page2 - If you ask me, the oakland restaurants with parking should be a filter for the first page that should't have its own url.If you don't like to index the pages with URLs like "/oakland-restaurants/?parking&page2", you can specify parking as an ignored parameter in WMT.

/oakland-restaurants/?page=2 - You can use "noindex" meta tag ("follow" is default) for sub pages like these.If you prefer to block them via robots.txt, you can do it.

/oakland-restaurants/ - This will be the one to index.