I'm trying to tie up some loose ends on a new pagination structure and have been struggling to find the best course for a couple of loose ends to avoid any sort of duplicate content issues or other unintended SEO consequences:
1.) How do I handle requests for pages which no longer exist? For example, there are a total of 401 items for a particular category which translates into 21 pages (with 20 results max per page). Lets say one item is removed from the database for whatever reason, thus decreasing the total page count back down to 20.
Currently, a mod rewrite rule will process any one or two digit variable such as:
http://www.example.com/category/4/
http://www.example.com/category/20/
http://www.example.com/category/21/
http://www.example.com/category/99/ (more relevant for #2 below)
My script (written in Perl) will check to see if page variable in the URL exceeds the actual page count, and if so currently just prints "sorry, no results" (for the time being while in development).
2.) Similarly, I imagine there will be some "random" requests for pages which are beyond the actual page count of a particular category? For example, there are not more than 20 actual pages, but somebody links to my site with the following:
http://www.example.com/category/99
Currently, my mod_rewrite rules ignore requests beyond the scope I've defined in my .htaccess file, triggering a 404. For example, all of the following result in a page not found:
http://www.example.com/category/4a
http://www.example.com/category/444
http://www.example.com/category/aaa
This isn't necessarily a mod_rewrite question, so I don't want to get bogged down in nuances of my mod_rewrite code, but rather simply explore the best general strategy for dealing with such "extraneous" requests. So, far I've come up with the following ideas (I'll refer to the above pages which are accepted by mod_rewrite conditions, but offer no content as "ghost pages" - for lack of a better term):
A.) Have my script place a <link rel="canonical" href="http://www.example.com/category/" /> on all ghost pages, but I'm not sure how this may play out in terms of SEO?
B.) Have my script place a noindex tag on all ghost pages. This seems like a good idea, but what happens when a category grows and starts a new page? Of course it won't be a ghost page at such a point, but have heard that it can be difficult to get Google to reverse a nonindex?
C.) My script is written in Perl, so not sure if it's possible (I believe it is in php) to send a 301 header to point back to the first page of the category (http://www.example.com/category/)?
D.) This might be the cleanest solution, but makes me weary for a few reasons: since mod_rewrite doesn't "know" the current state of my database (and page counts) and will pass along variables to my script regardless if they reflect actual pages or not, perhaps I could write another script which would dynamically update the .htaccess file and respective code to constrain what mod_rewrite accepts as page number variables?
Any thoughts / ideas on this issue is would be appreciated, thank you.