Welcome to WebmasterWorld Guest from

Forum Moderators: Robert Charlton & andy langton & goodroi

Message Too Old, No Replies

SEO "Loose Ends" Regarding Pagination

8:02 pm on Jan 20, 2013 (gmt 0)

New User

5+ Year Member

joined:May 15, 2012
posts: 25
votes: 0

I'm trying to tie up some loose ends on a new pagination structure and have been struggling to find the best course for a couple of loose ends to avoid any sort of duplicate content issues or other unintended SEO consequences:

1.) How do I handle requests for pages which no longer exist? For example, there are a total of 401 items for a particular category which translates into 21 pages (with 20 results max per page). Lets say one item is removed from the database for whatever reason, thus decreasing the total page count back down to 20.

Currently, a mod rewrite rule will process any one or two digit variable such as:

http://www.example.com/category/99/ (more relevant for #2 below)

My script (written in Perl) will check to see if page variable in the URL exceeds the actual page count, and if so currently just prints "sorry, no results" (for the time being while in development).

2.) Similarly, I imagine there will be some "random" requests for pages which are beyond the actual page count of a particular category? For example, there are not more than 20 actual pages, but somebody links to my site with the following:


Currently, my mod_rewrite rules ignore requests beyond the scope I've defined in my .htaccess file, triggering a 404. For example, all of the following result in a page not found:


This isn't necessarily a mod_rewrite question, so I don't want to get bogged down in nuances of my mod_rewrite code, but rather simply explore the best general strategy for dealing with such "extraneous" requests. So, far I've come up with the following ideas (I'll refer to the above pages which are accepted by mod_rewrite conditions, but offer no content as "ghost pages" - for lack of a better term):

A.) Have my script place a <link rel="canonical" href="http://www.example.com/category/" /> on all ghost pages, but I'm not sure how this may play out in terms of SEO?

B.) Have my script place a noindex tag on all ghost pages. This seems like a good idea, but what happens when a category grows and starts a new page? Of course it won't be a ghost page at such a point, but have heard that it can be difficult to get Google to reverse a nonindex?

C.) My script is written in Perl, so not sure if it's possible (I believe it is in php) to send a 301 header to point back to the first page of the category (http://www.example.com/category/)?

D.) This might be the cleanest solution, but makes me weary for a few reasons: since mod_rewrite doesn't "know" the current state of my database (and page counts) and will pass along variables to my script regardless if they reflect actual pages or not, perhaps I could write another script which would dynamically update the .htaccess file and respective code to constrain what mod_rewrite accepts as page number variables?

Any thoughts / ideas on this issue is would be appreciated, thank you.
9:12 pm on Jan 20, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
votes: 0

You raise a very important point.

For page numbers that do not exist, return the HTTP 404 Not Found header from your script.

Once a request is rewritten and pointed to an internal script, you're beyond the point that htaccess is handling the request and so the PHP or Perl script must return the right headers and the correct content or error mesage.

Failure to do so leads to the site being flagged for "soft 404 errors", and you don't want that.

For other content pages, such as products or posts, you'll want to return 404 Not Found if they don't yet exist. They'll return 200 OK when they do exist. When they no longer exist, they should return 410 Gone, the status delivered by your PHP or Perl script.

404 - The server can't find it, doesn't know why it can't find it, doesn't know if it ever existed, and doesn't know if it ever will exist. Google will check again from time to time to see if the status changes.

410 - The content is Gone and is probably never coming back (Google will still check occasionally just in case it does come back).
11:06 pm on Jan 20, 2013 (gmt 0)

New User

5+ Year Member

joined:May 15, 2012
posts: 25
votes: 0

Thank you g1, 404 (or 410) was my ideal solution, but for some reason I had thought is wasn't possible with Perl; however, I was wrong. Came across some code and implemented as follows:

if ($CurrentPage > "$CategoryPageCount") {

print "Status: 404 Not Found\r\n";

print "Content-Type: text/html\r\n\r\n";

print "<h1>404 File not found!</h1>";



Which generates the following entry in apache's access log (which I believe is what I'm after): - - [20/Jan/2013:15:51:37 -0700] "GET /category/25/ HTTP/1.1" 404 28 "-" "Mozilla/5.0 (Windows NT 6.1; rv:18.0) Gecko/20100101 Firefox/18.0"

However, I was originally hoping to trigger my global custom 404 page, but no luck ... so opted to feed the "404 File not found!" message which appears in the block of code - which I believe is acceptable and better than the alternative.

To really fine tune things, I suppose it might be beneficial to work in a 410 for those pages which existed, but were removed. This might prove to be a bit more of a challenge. I suppose I could create another field in the database which tracks an historical maximum number of pages per category, then compares the $CurrentPage to both $CategoryPageCount as well as the new $HistoricalMaxCategoryPageCount and issue a 410 where appropriate.
12:05 am on Jan 21, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 7, 2003
posts: 750
votes: 0

I wouldn't use 410 for these pages. What if you get more products again and the pages come back? Seems like a very likely senario.

I've used 302 (temporary) redirects to redirect back to page 1 in this type of situation. You certainly don't want to use 301 redirects, because like 410, that implies permanence.
12:16 am on Jan 21, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
votes: 0

For products that go away and are never coming back, returning 410 Gone is appropriate.

For paginated lisings, e.g categories, return 404 Not Found when higher-numbered pages are gone because they may well come back.

I believe that all scripting languages allow you to override the HTTP headers and send something other than 200 OK.

Returning 404 Not Found from within your PHP or Perl script won't invoke your global error page. Once your PHP or Perl script is dealing with the request, you're way past the parts of Apache that check whether the request will resolve to a file and invoke error messages if not.

The usual method in PHP is to send the 404 HTTP header and then "INCLUDE" the file that contains the human readable 404 error message. I assume that Perl has some equivalent.
1:03 am on Jan 21, 2013 (gmt 0)

New User

5+ Year Member

joined:May 15, 2012
posts: 25
votes: 0

Thank you for the clarification on 410. I thought it meant gone, but could come back ... with a higher emphasis (vs. 404) on could come back. Regardless, it's easier to keep all pages I'm referring to as 404.

Yeah, I kind of wondered if it might have been an issue re: where along the assembly line apache might still allow for the custom error page to be called. In any case, thank you g1 for pointing out the PHP method ... which I'm not sure of the Perl equivalent, but it quickly got me to thinking that it would be easy enough to simply open up the actual custom error message file and print it, but not sure if it's the wisest way to go about it - not sure about unintended consequences regarding resource utilization if the page gets hits alot for whatever reason?

if ($CurrentPage > "$RoundedCategoryPageCount") {

print "Status: 404 Not Found\r\n";

print "Content-Type: text/html\r\n\r\n";

open (PAGENOTFOUND, "/absolute/path/to/custom/error/message/file.html") || die "couldn't open the file!";

while ($contents = <PAGENOTFOUND>) {
print $contents;



1:15 am on Jan 21, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
votes: 0

In theory "
410 Gone
" is supposed to be "Gone for good, Never coming back".

In practice, many sites do have pages that come back having been 410 status at some time in the past, so Google does still look a couple of times per year to make sure the status is still 410.

Imagine you bought a domain name from someone, and root
had been returning
410 Gone
for the last three years. If
410 Gone
literally meant Gone Forever, you could never get your new home page indexed.

In the real world there are many things that "reset" or "override" the previously recorded Gone status, but I would never want to rely on that behaviour. So, 404 for some things and 410 for others it is.

P.S. A Google search for "perl equivalent of php include" may be useful.
2:55 pm on Jan 21, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:June 24, 2005
posts: 446
votes: 0

I would assume 404's would be best.

Now naturally google wants you to do rel="next" and rel="prev" for all paginated pages, else they will force the pages to compete with each other which will dilute your content or create duplicate content. There was a webmaster video on this somewhere...

I would assume a page that was part of a prev/next chain would be grouped into a super page entity and if removed google would understand this and not care too much. In essence I suspect google would think you "just made your page shorter". But it is a good question though...
8:22 pm on Jan 21, 2013 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
votes: 288

Google actually suggests three options for pagination... one of which is rel="next" and rel="prev". See some further discussion in this thread....

Changing count of posts per page on forum & its effect on rankings
http://www.webmasterworld.com/google/4535210.htm [webmasterworld.com]

In the thread I link to an excellent Google support page describing all three options, and also link to the video....

Pagination - Google Support
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663744 [support.google.com]

Which option you use, IMO, depends on the situation and how you implement it. There's a View All option, eg, where all paginated pages are regarded by Google as one. View All is probably the best option for paginated forum threads, eg, where, as I note in the thread, "the unpredictable shape of forum discussions" makes this a wise choice.

For product pages, if you prioritize the order of products (most important products first), then rel="next" and rel="prev" appears to be the best approach. As Google describes on the support page...

My emphasis added at the end...
Use rel="next" and rel="prev" links to indicate the relationship between component URLs. This markup provides a strong hint to Google that you would like us to treat these pages as a logical sequence, thus consolidating their linking properties and usually sending searchers to the first page.

And if you either do nothing or do something wrong, Google says...

Paginated content is very common, and Google does a good job returning the most relevant results to users, regardless of whether content is divided into multiple pages.

And later...
...if an expected rel="prev" or rel="next" designation is missing... we'll continue to index the page(s), and rely on our own heuristics to understand your content.
8:50 pm on Jan 21, 2013 (gmt 0)

New User

5+ Year Member

joined:May 15, 2012
posts: 25
votes: 0

I added pagination along with rel="next" and rel="prev" hoping it will allow Google to view and associate the full depth of any given category. This was the primary goal of restructuring this component. Up until now, I've used a "global script" to allow visitors to see the next page of results for any given category; however, there were at least a couple/few problems:

1.) Very cumbersome for the visitor to have to click "See next 20 listings" at the bottom of every page

2.) It's probably sending mixed/bad signals to Google in so far as having a total of 120 categories all funneling into the same script (URL) for subsequent results.

I've seen a consistent, but somewhat "muted/dampened" (never too severe) decline from most Panda refreshes over the past year & I'm crossing my fingers (and toes) that pagination might help on this front by providing Google with easy access to the entire depth of directory structure - which may offer and tie in greater semantic continuity of content.

One issue which I'm a bit concerned about is the fact that I randomize results. Unfortunately, I can't escape this fact ... as it's an integral component of my business model. Essentially, all paid listings (as a group) appear above free listings & within each group all listings are randomized. Randomization used to occur on the fly; however, with the addition of pagination, I've created a script to essentially perform a randomization via a cron job once per day (overnight) and load the order of listings for any given category into a database. I then use this ordering for 24 hours until the deck gets shuffled again. I suppose I could increase the interval (i.e. weekly cron job), but not sure if that would help matters?

I'm considering adding a "view all" in addition to pagination; however, I'm very concerned about resource utilization, but if it's necessary to mitigate the fact that I'm randomizing results, then it'd be worth it. If I do add a "view all" link, I'm not sure what the best way is to nudge Google into indexing the front page of the category? Would placing a noindex and also canonical (-> front page of category) on the view all page work to this end (or is it a no-no akin to placing it on paginated pages)?
3:43 am on Jan 22, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
votes: 462

I'm considering adding a "view all" in addition to pagination; however, I'm very concerned about resource utilization

I don't think you need to be. The users are thinking the same thing-- and it's a lot more noticeable from their end. How often do you select Show All on shopping sites? How many results-per-page does your search engine show? (G### goes up to 100, but I don't know anyone who uses this number by default.) Just make sure your default page says what the total is. "1-20 of 48" and the user might well go for Show All; "1-20 of 480" and they probably won't.

If you're already at the edge of your bandwidth or RAM limits, it's probably time for an upgrade anyway. "It's a great site but it takes forever to load" has got to be on the list of Top Ten Things you never want your users to say.

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members