homepage Welcome to WebmasterWorld Guest from 54.205.106.111
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
a little help rewriting canonical pagination urls
mike2010




msg:4642784
 8:09 pm on Feb 5, 2014 (gmt 0)

I've got a little php script i'm using for new content. But I did something a little wrong.

It does what it's suppose to, but it's also adding unnecessary, duplicate url's.

In .htaccess -


RewriteRule ^my-silly-new-content-([^.]+).html$ my-silly-new-content.html?page=$1



Basically what it does is add the first page which is :

mysite.com/my-silly-new-content.html

and also the pages that increase whenever more content is added...such as :

mysite.com/my-silly-new-content-2.html
mysite.com/my-silly-new-content-3.html

etc, etc..so that's good. (whenever I have more than 10 items per page, I need it to go on the next page)

But, I just noticed the problem comes in, whenever anything is added after the

my-silly-new-content

part.

so that just typing in anything after that part, such as :

mysite.com/my-silly-new-content-blahblahblah.html

or

mysite.com/my-silly-new-content-whateverwhatever.html

results in a new page as well, which displays the default content on

my-silly-new-content.html

Basically, I just want the numeral system to keep going, while blocking any other extras from being indexed...to prevent duplicate content.

Thought I had the rewrite down flawlessly, but I guess not. :(

 

lucy24




msg:4642819
 10:06 pm on Feb 5, 2014 (gmt 0)

In .htaccess -

What flags are attached to your rule? It isn't directly relevant to the question, but I sure hope there's an [L]. Incidentally, the target should start with / (slash = root).

whenever anything is added

Yes, that's one of the long list of Problems You Don't Have To Worry About Unless They Happen. Here the fix is simple: just replace the all-encompassing
[^.]+
with a narrower
\d+
or (if you don't trust mod_rewrite's RegEx engine)
[0-9]+
Also make sure that the php script itself returns a 404 for any non-numeric values of "page".

mike2010




msg:4643043
 5:23 pm on Feb 6, 2014 (gmt 0)

good enough, you should really have a Top Contributor tag next to your name. (Mods listening out there ? )

both instances worked whenever any letter / word was added into the mix. But numericals still passed through. example

my-silly-new-content-2222.html

still displayed a page, but idc. good enough. probably will only have 1000 max page total....before cutoff in the database.

The general flag / php coding that calls that part out :


include "pagination.class.php";

$p = new pagination;

// Items per page
$p->perPage = 120;

// Pagination left from current
$p->paginationLeft = 3;

// Pagination right from current
$p->paginationRight = 3;

// Link href
// $p->path = '?page=%d'; // or $p->path = 'example/%d/';
$p->path = '/feeds/my-silly-new-content-%d.html';

// Paginaion appearance
$p->appearance =
array(
'nav_prev' => '<a href="%s" class="prev"><span>prev</span></a>',
'nav_number_link' => '<a href="%s"><span>%d</span></a>',
'nav_number' => '<a href="javascript:;" class="active"><span>%d</span></a>',
'nav_more' => '<a href="javascript:;" class="more"><span>...</span></a>',
'nav_next' => '<a href="%s" class="next"><span>next</span></a>'
);


$count = $db->get_one('SELECT count(*) as cnt FROM feeds');

// Items count
$p->setCount(1000);

// Current page
if(isset($_GET['page'])){
$p->setStart($_GET['page']);
}

lucy24




msg:4643070
 7:34 pm on Feb 6, 2014 (gmt 0)

both instances worked whenever any letter / word was added into the mix. But numericals still passed through.

This part is probably easier to do in php. Especially if the cutoff is some arbitary number: 2134 is valid, 2135 isn't.

You can certainly include a line in your RewriteRule that constrains the URL to some number of digits, like

^my-silly-new-content-(\d{1,4})\.html$

Anything with too many digits would then bypass the rule and meet an ordinary server-generated 404.

You might also think about eliminating leading zeros if they don't have meaning. That's yet another of those Problems You Don't Have Until You... I've got one myself that says
RewriteRule ^dir/dir2/chap0+(\d+\.html) http://example.com/dir/dir2/chap$1 [R=301,L]
g1smd




msg:4643132
 10:54 pm on Feb 6, 2014 (gmt 0)

Your PHP script should be amended to return a 404 HEADER and to INCLUDE your 404 page when a URL request is invalid, such as a non-existent page number.

The RegEx pattern in the Rule can be constrained to ensure the URL ends with digits for the page numbers. I would use -1 for page one here for consistency.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved