Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

404 or NoIndex Pagination Pages?

         

wsc102

7:38 pm on Sep 1, 2013 (gmt 0)

10+ Year Member



I have a website that has around 90k pages indexed, but after doing the math I realized that I only have around 20-30k pages that are actually high quality, the rest are paginated pages from search results within my website. Every time someone searches a term on my site, that term would get its own page, which would include all of the relevant posts that are associated with that search term/tag. My site had around 20k different search terms, all being indexed. I have paused new search terms from being indexed, and now I have added a view all to the main search page, so the pagination pages aren't visible to the front end user.

What I want to know is if the best route would be to 404 all of the useless paginated pages from the search term pages. And if so, how many should I remove at one time? There must be 40-50k paginated pages and I am curious to know what would be the best bet from an SEO standpoint.

If the best bet is noindex, follow, should I do it to all of the paginated pages all at once? Should I do canonical for these pages as well?

All feedback is greatly appreciated. Thanks.

lucy24

9:40 pm on Sep 1, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Easy part: Use 410 rather than 404. You'll get rid of the googlebot a lot faster.

Hard part:

phranque

10:33 pm on Sep 1, 2013 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



search results pages should be noindexed.

if you will never show a page again the url should get a 410 response.

JD_Toims

1:31 am on Sep 2, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What I want to know is if the best route would be to 404 all of the useless paginated pages from the search term pages.

What phranque said.

If the best bet is noindex, follow, should I do it to all of the paginated pages all at once?

Pagination I'd noindex and I can't see why not all at once. They won't likelly all be crawled on the same days, so I'd make sure it was there on the next crawl cycle if I wanted them removed.

[Note: Follow is default, so no need to include it.]

Should I do canonical for these pages as well?

Absolutely.

brokaddr

5:42 am on Sep 2, 2013 (gmt 0)

10+ Year Member



Use 410 rather than 404. You'll get rid of the googlebot a lot faster.

What if there's a possibility, a few months from now, a different article may end up on that url? (Same subject, etc.) Does it matter? Does Google/other SE's care?

Pagination I'd noindex and I can't see why not all at once. They won't likelly all be crawled on the same days, so I'd make sure it was there on the next crawl cycle if I wanted them removed.

Can you guys elaborate on what you mean by filtering 'pagination'? - index.php?page=2 right?

If so, what if the content on page 2 is a continuation of page 1, not identical content? Wouldn't you be eliminating a (potentially) useful page to a searcher?

lucy24

6:15 am on Sep 2, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If there's a new page using the old URL, then I have to assume there will be fresh links pointing to it. A 410 only tells the googlebot to "forget" an URL it has previously crawled; it doesn't tell it to ignore the URL forever afterward :)

If you do reuse or restore an old URL, make sure you change any code (RewriteRule, Redirect or similar) that's currently returning a 410. Ask me how I know this. Oops.

brokaddr

6:58 am on Sep 2, 2013 (gmt 0)

10+ Year Member



Interesting, appreciate the insight as always lucy24. :)

JD_Toims

7:16 pm on Sep 2, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you guys elaborate on what you mean by filtering 'pagination'? - index.php?page=2 right?

I may have misunerstood, sorry.

If the content on page 2 is unique, then I'd allow it to be indexed, but if it's say a list of say paginated products that are likely to "compete" with page 1 I'd probably noindex those personally.

So, for an article that's paginated I'd allow all pages to be indexed, but for products 21 to 40 [or whatever numbers you decide to insert] that were the just a continuation of the products listed on page 1 I'd personally be likely to noindex those to try and keep visitors landing on page 1 rather than page 42.

turbocharged

7:27 pm on Sep 2, 2013 (gmt 0)



We use canonical and leave it open for search engines to crawl. No issues whatsoever. You may want to also use rel=next, etc. if applicable. But if they are really thin pages, noindex is the best bet.

aakk9999

10:50 pm on Sep 2, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We use canonical and leave it open for search engines to crawl.

This is not the correct way to use rel=canonical for pagination. In fact Google list this as canonical #1 mistake in their article published few months ago:

5 common mistakes with rel=canonical [googlewebmastercentral.blogspot.co.uk]
Specifying a rel=canonical from page 2 (or any later page) to page 1 is not correct use of rel=canonical, as these are not duplicate pages. Using rel=canonical in this instance would result in the content on pages 2 and beyond not being indexed at all.

Rel=canonical with pagination is only recommended if you have "View all" version of the page, which lists all entries paginated elsewhere. In this case you would have "View all" page indexed and pagination pages (that each are then the subset of "View all" page) should have rel=canonical implemented pointing to "View all" page.

I normally do what JD_Toims says - if the pagination lists products, I noindex pagination pages from 2 onwards. If the pagination is for paginating an article that spans several pages, then I leave it all to be indexed.

lucy24

11:05 pm on Sep 2, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have paused new search terms from being indexed, and now I have added a view all to the main search page, so the pagination pages aren't visible to the front end user.

Is the "view all" option available to search engines before they land on page 1? If so, you can tell google to ignore all URLs that even contain the "page" parameter.

Robert Charlton

12:03 am on Sep 3, 2013 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I think this use of "paginated" in the thread title and in your question is confusing the issue. Your pages aren't what we think of when we think of "paginated" pages... ie, as continuing pages of lengthy articles or product listings that get broken off to additional pages.

You posted... my emphasis added...
Every time someone searches a term on my site, that term would get its own page, which would include all of the relevant posts that are associated with that search term/tag

"Its own page" doesn't sound like these pages are paginated. They are machine generated tag pages, and I feel that they're a really bad idea, both for users and for Google.

In essence, they're search pages, but I think that, as used on the site (and I'm guessing here) they're much worse than search pages, because of how they affect onpage navigation. In unlimited amounts they give users way too many thin/shallow onpage nav choices, and they're a Panda disaster.

Check out this thread, which I suspect is about something very similar...

Keyword albums / tag clouds triggering Panda algo penalty?
http://www.webmasterworld.com/google/4600806.htm [webmasterworld.com]

See my comments on that thread about why I would only use noindex as a temporary solution at best. I suggest you remove the pages, turn off the "feature", and return 410s. Build proper direct navigation links on your site, in addition to well thought out category pages.

I'll try to get back to that thread and post some additional thoughts with regard to user friendliness. These tag pages, though, are not user friendly.