Forum Moderators: phranque

Message Too Old, No Replies

Added script my Rewrite won't allow it to display 2nd page

         

HaloPlayer

5:42 am on Apr 11, 2015 (gmt 0)

10+ Year Member



I've started using Commentics (which is a PHP based comments script which users can post comments on your pages). The only problem is that when you click on the 2nd page comments it won't load them, because of my sites rewrite which rewrites dynamic links to static. e.g www.example.com/book.php?name=titanic becomes www.example.com/book/titanic

So when I click on the 2nd page comments this is added to the end of the URL ?cmtx_page=2#cmtx_comments so obviously the URL is not found because of the rewrite. So I tried fiddling around with the QSA and changing a condition that would allow query strings, obviously this didn't work. Is there an existing rule that is not allowing the change?

This is my htacces:

Options +FollowSymlinks
RewriteEngine on
RewriteCond %{THE_REQUEST} name=([-a-z0-9_]+)
RewriteRule ^book\.php$ http://www.example.com/book/%1? [R=301,L]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-d

phranque

6:24 am on Apr 11, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



RewriteRule ^book\.php$ http://www.example.com/book/%1? [R=301,L]

that question mark in the RewriteRule Target removes the query string from the redirect, which is otherwise appended by default.

try this:
RewriteRule ^book\.php$ http://www.example.com/book/%1 [R=301,L]


you will still lose the fragment identifier (#cmtx_comments) but not the query string (?cmtx_page=2)

lucy24

7:21 am on Apr 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



this is added to the end of the URL ?cmtx_page=2#cmtx_comments

But the #cmtx part isn't actually added to the URL, is it? It's sent to the browser, but the browser doesn't send it back to the server. (Fun fact: You can send a fragment as part of a redirect target. It just can't travel in the other direction.)

Seems like what you ought to do in this situation is bypass mod_rewrite entirely with a preliminary [L] rule that intercepts any request with "cmtx_page" in the query string. If there's no redirect, the browser will happily remember the fragment until it gets there.

HaloPlayer

8:37 am on Apr 11, 2015 (gmt 0)

10+ Year Member



Thank you for the replies.
phranque thanks for your suggestion, but this did not work.

lucy24 You are right, it's not added to the URL, that was just me and my simplistic knowlegde trying to explain it LOL
If I do bypass the mod_rewrite won't I lose the ability to have the static links?

I also realised I didn't copy the whole htaccess just in case I missed something:

Options +FollowSymlinks
RewriteEngine on
RewriteCond %{THE_REQUEST} name=([-a-z0-9_]+)
RewriteRule ^book\.php$ http://www.example.com/book/%1? [R=301,L]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^book/([-a-z0-9_]+)$ /book.php?name=$1 [L]
<Files 403.shtml>
order allow,deny
allow from all
</Files>

deny from 37.98.81.188

deny from 5.35.208.53

RewriteCond %{HTTP_REFERER} !^http://example.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://example.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.example.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.example.com$ [NC]
RewriteRule .*\.(zip|ace|rar|7z)$ http://www.example.com/no_hotlinking.htm [R,NC]

deny from 64.124.0.0/15
deny from 74.217.0.0/16
Options -Indexes
deny from 113.0.0.0/8
deny from 195.0.0.0/8


BTW I am liking the new site design :-)

phranque

7:20 pm on Apr 11, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



phranque thanks for your suggestion, but this did not work.

what response did you get?

lucy24

7:22 pm on Apr 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I do bypass the mod_rewrite won't I lose the ability to have the static links?

Uh-oh, I may have misunderstood. I was envisioning something that would only occur with a human reading a comments thread, while search engines crawl a complete-thread version. Are paginated URLs also available for crawling (and, hence, bookmarking and other direct visiting)? How do you avoid duplication as a thread gets longer and old content gets pushed to a different page?

When the human user clicks the Next Page link, does a referer get sent? You could code it so URL rewriting is bypassed when there's a referer: "If user requests page 2, and the referer shows that they were on page 1, then leave the URL as is."

Let's backtrack a bit. What are the possible URLs for a multi-page comment thread, and what referers are sent each time? How, if at all, would a search engine view later pages? (The present forum is indexed, but I know plenty that aren't.)

HaloPlayer

11:19 am on Apr 15, 2015 (gmt 0)

10+ Year Member



I really suck at trying to explain this lol so I apologize for the confusion.

So with my site the links are dynamic www.example.com/book.php?name=book_name The rewrite makes it www.example.com/book/book_name

Now with the commentics script you add a PHP include onto the page (book.php) which then displays comments and the option to add comments. In the commentics settings you can choose how many comments are displayed per page, such as 5.
So when you have more than 5 comments per page you get the Page 1 Page 2 etc link, when I click the Page 2 link the URL is: www.example.com/book/book_name?cmtx_page=2#cmtx_comments which just loads the current page, what is happening (I believe) is because there is no rewrite rule for "?cmtx_page=2#cmtx_comments" it just directs to current page, so I guess what I am asking is I will have to add a new rule to accept the www.example.com/book/book_name?cmtx_page=2#cmtx_comments link won't I because it is just looping isn't it?

lucy24

7:40 pm on Apr 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



www.example.com/book/book_name?cmtx_page=2#cmtx_comments

OK, now I'm getting a picture. I don't personally know commentics but this sounds like the same system as, say, Disqus, where there's a post and then there's discussion. (Let us try not to consider the possibility that it works like Discourse, with its mysterious fixation on the number 19, let alone like Int ... oh, never mind.)

What does the URL look like if the discussion is <= 5 comments? Does it say cmtx_page=1 or is there no query at all? And, when a human clicks the "next page" link, what does the request look like? If you're not sure, try it yourself, make a note of the time, and then see what's in your raw logs at that time. With an external redirect, each request will show up separately in logs. I'm particularly interested in the referer.

You may just want to exclude requests with "cmtx_page" in the query string. Exclude from rewriting, I mean, with a preliminary [L] rule. I think I said this before, but there are currently two somewhat similar threads in /apache/ so I've lost track. And then there's the question of whether you want the discussion to be indexed. If not, you don't need to worry about duplicate content. But just watch: If you've got a recurring pool of commentors, sooner or later someone will try to dredge up that utterly priceless thing John said back in September of 2013.

HaloPlayer

12:08 pm on Apr 17, 2015 (gmt 0)

10+ Year Member



lucy24
So I deleted the .htaccess file altogether so I could view the URL in it's raw form, this is what the page 2 URL looks like:
www.example.com/book.php?cmtx_page=2&name=book_name#cmtx_comments

Page 1 comments are loaded by default and do not effect the URL, unless you are on the 2nd page comments and click Page 1, then the URL is page1 etc.

I would like to have the comments indexed, it can be handy in Google search results when you have human interaction on your pages.
Any thoughts on this dilemma!?

P.S. Can you edify me on this utterly priceless thing John said back in September of 2013? You have me intrigued :-P

lucy24

7:30 pm on Apr 17, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



then the URL is page1

So page = 1 is the same content as {nothing}. Ugh.
:: detour to WMT ::
If "cmtx_page" ever shows up in your WMT parameters list, you'll want to tweak the settings to say "crawl this URL only if the value is > 1". Except, ahem, that doesn't seem to be an option. You can only set the value to some exact string, unless there's more stuff they're hiding from us. It would be nice to think that Google is already familiar with your comment software and knows this pagination quirk. But, after all, this is the same Google that claims I have over 30 links from a single blog post, and I believe it's their own blogging platform.
:: thinking ::
RewriteCond %{QUERY_STRING} cmtx_page=
RewriteRule ^blahblah - [L]
where "blahblah" represents the "path" part of any URL that can contain a discussion thread. (You need this part, instead of leaving it at . or ^ alone, so the server doesn't have to go look up conditions on every request ever.) This rule goes after any access-control RewriteRules, but before any rules that create an external redirect.

Now, if you wanted to, you could say something like
RewriteCond %{QUERY_STRING} ^cmtx_page=1(&(.*))?$
RewriteRule ^(blahblah) http://www.example.com/$1?%2 [R=301,L]
which forcibly redirects any request for page1 back to the unpaginated form. That's assuming "cmtx_page" is always at the beginning of the query string, and that "page1" has no meaning.

phranque

12:56 am on Apr 22, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



which forcibly redirects any request for page1 back to the unpaginated form

this is what i would suggest doing.

then submit a support request to Commentics asking them to stop generating (noncanonical) urls with "cmtx_page=1" in them.

HaloPlayer

9:26 am on May 2, 2015 (gmt 0)

10+ Year Member



I actually found something interesting out when I viewed the URL's in raw form (without the rewrite) and it appears for some reason commentics makes the URL:

http://www.example.com/book.php?cmtx_page=2&name=book_name#cmtx_comments

I would have thought at the very least it would be:

http://www.example.com/book.php?name=book_name&cmtx_page=2#cmtx_comments

Any thoughts on this?

lucy24

5:02 pm on May 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, they can put their queries in any order they like. But if you did want to redirect page1 requests, this version makes the rule more efficient, because you really can say
RewriteCond %{QUERY_STRING} ^cmtx_page=1(&(.*))?$
RewriteRule ^(blahblah) http://www.example.com/$1?%2 [R=301,L]

(the form I gave above) instead of having to capture the "book_name" part separately. If they had used the form name=book_name&cmtx_page=2 then the rule I made up would not have worked. Thanks, Commentix! ;)

phranque

5:38 am on May 3, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Any thoughts on this?

the order of parameters in the query string is important as those are two unique urls.
while they may in fact generate identical content, there should only be one canonical url.
all requests for non-canonical urls should be redirected to the canonical url.

HaloPlayer

7:35 am on May 8, 2015 (gmt 0)

10+ Year Member



Thanks lucy24 & phranque for the replies, I really do appreciate the help :-)

So which direction should I go, what kind of rewrite should I go for and how do I avoid Google penalizing the site for duplicate content? Should I just be going for a rewrite that would make each "cmtx_page=2" to rewrite to /page/2 /page/3 etc. or something? What would you suggest?

lucy24

5:50 pm on May 8, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



each "cmtx_page=2" to rewrite to /page/2 /page/3 etc.

I hope you meant that the other way around: Each "cmtx_page=" in the query is redirected to a /page/ element in the path, and then you rewrite back to the original with-query form. But this is more of an SEO question than an apache question, so don't look at me ;)

What I can say is that there's no reason for forms like /page/2 with extra slash: just say /page2 /page3 etc. Think of it like this: for each slash-delimited section of the path, there should be more than one possible thing between the slashes. So /page/2/ /page/3/ /page/4/ makes no sense, because "page" is always the same and only the number changes. Now, you might have a separate URL element to distinguish /discussion/ from /post/ but that's a different matter.

Do you envision a lot of humans bookmarking or tweeting a particular page of the discussion thread? (Tweeting is probably a better question, because then the length of the URL really does matter.) If not, it seems like it would be more trouble than it's worth.

HaloPlayer

4:50 am on May 9, 2015 (gmt 0)

10+ Year Member



lucy24 sorry I didn't mean the extra slashes. /page2 /page3 /page4 was what I meant. Would this be an extra rule on top of my existing rewrite or do I modify the original rewrite?
I am concerned about duplicate content being detected by Google, should I be concerned?

lucy24

5:10 am on May 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Backtracking a bit: Does the with-and-without "page1" part of the URL apply only to the discussion thread? Or is this also the URL for the original post or article or, er, whatever it is people are discussing?

If the only duplicate content is one page of discussion per post, then we may already have spent more time worrying about the issue than it's worth. Unless, er, your commenters are the most scintillating bunch of literary artists that ever lived,* and you expect to gain a bundle from having their comments indexed.

Personally I wouldn't bother with wholesale rewriting and redirecting. Sure, redirect page1 to the no-page version if the googlebot has discovered the duplication. But other than that, naah.


* Looking at you, Clients From Hell.

HaloPlayer

8:25 am on May 9, 2015 (gmt 0)

10+ Year Member



Original page and page1 page2 etc. are all the original page with each respective page number comments on it.

I think I will leave it as is in regards to duplicate content and just see how Google handles it, then I might do a rewrite down the track.

So now I have to just work out what the best rewrite approach to get the page2 page3 etc. comments to actually load, the rule that phranque suggested unfortunately did not work:

RewriteRule ^book\.php$ http://www.example.com/book/%1 [R=301,L]

not2easy

3:13 pm on May 9, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Maybe with a slight edit?
RewriteRule ^book\.php$ http://www.example.com/book/$1 [R=301,L]

lucy24

6:33 pm on May 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One approach:
RewriteCond %{QUERY_STRING} ^cmtx_page=1(&(.*))?$
RewriteRule ^(blahblah) http://www.example.com/$1?%2 [R=301,L]

RewriteCondRewriteCond %{QUERY_STRING} cmtx_page
RewriteRule ^(blahblah) - [L]
where, again, "blahblah" is the path that all these queries are attached to. If it's always identical, like "book\.php", then you don't need to capture it. Just repeat the literal text on both sides, pattern and target. This pair of rules goes at the beginning of your external redirects-- that is, before the rule that redirects everything with a parameter. (The issue that started this whole thread.)

Translation: If the query string starts with "page 1" then redirect to the equivalent version without this parameter. Otherwise, if the query string contains the "cmtx_page" parameter then leave it alone and don't redirect. Note the %2 in the first target. I don't think a query string will actually break if it starts with & -- but let's not take chances.

This is a major, established discussion-thread software package, right? You have to assume google is familiar with its URL structure and won't hold it against you that the same text will appear on every page of the thread. Well, after all, every page of a multi-page WebmasterWorld discussion starts with the original post over again, and I've never noticed Google having any objection to this site ;)

Putting an [L] rule before the external redirects does mean that requests for later pages of the discussion thread will never meet the domain-name-canonicalization redirect. But at this point, the only people asking for the wrong name will be the search engines intentionally putting in the wrong hostname-- and I'm sure you've already told them which form you want indexed.