Forum Moderators: phranque

Message Too Old, No Replies

Removing unwanted specific characters from query parameter

Dealing with duplicate page problem

         

ej2747

9:12 pm on Oct 7, 2022 (gmt 0)



I have many pages that are being detected as duplicated in search, with the duplicate version reported as
"Duplicate without user-selected canonical"

Example pair:
example.com/cgi-bin/yabb/YaBB.pl?num=1408559953 wanted version
example.com/cgi-bin/yabb/YaBB.pl?num=1408559953/0 unwanted duplicate version

Suggestions please for the apache config file to remove the /0 and return the wanted version instead.
There are other variations of trailing characters, eg /200, which I do not want affected, so the requirement is specific to /0

phranque

7:26 am on Oct 10, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], ej2747!

Suggestions please for the apache config file to remove the /0 and return the wanted version instead.

i would suggest using a RewriteRule directive to redirect the cgi-bin/yabb/YaBB.pl path to the canonical url using a 301 status code and preceding this directive with a RewriteCond directive that tests for the target query string (${QUERY_STRING}) and captures the part of the query string you want to keep for use in the substitution string (in the following RewriteRule directive mentioned previously)

i would try a pattern similar to this for starters to match and capture according to your requirements:
^(num=[0-9]+)/0$

and then if necessary make the pattern as specific as possible or as general as required.

here is the documentation for mod_rewrite directives [httpd.apache.org]

ej2747

3:07 pm on Oct 10, 2022 (gmt 0)



Many thanks. I have made this below:

RewriteCond %{QUERY_STRING} ^(num=[0-9]{10})/0$ [C]
RewriteRule . http://www.example.com/cgi-bin/yabb/YaBB.pl?%1 [R=301,L]

Further comments welcome. Are the [C] and [R=301,L] flags correct?

lucy24

4:45 pm on Oct 10, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The flag [R=301,L] is the standard flag for any redirect, so that's definitely correct, unless there is a known and specific reason for omitting the [L].

The flag [C] otoh is definitely incorrect because that's a flag for Rules, not Conditions. In fact it is SO wrong
:: quick run to test site ::
that it will cause a 500 error. Ordinarily the only flags you see in Conditions are [NC] and [OR] -- in each case, only if there is a known and specific reason for the flag.

The RewriteCond as written will only match if the query string consists entirely of "num=[0-9]{10})/0" with no other parameters. If there might be other parameters, then replace both anchors with \b ("word boundary").

There are other issues, though. As written, the RewriteRule will stop to check conditions on every request ever, including requests for things like images that wouldn't normally come with a query string. Try to constrain the Rule so it only matches requests that might actually have a query--typically requests ending in either / or whatever extension you use.

The second issue is: where are those /0 coming from? It doesn't seem likely that users are typing them in, so try to find where in the site code it's getting tacked-on, and see if you can prevent it from happening in the first place.

phranque

12:19 am on Oct 11, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I have made this below

great start!

As written, the RewriteRule will stop to check conditions on every request ever, including requests for things like images that wouldn't normally come with a query string. Try to constrain the Rule so it only matches requests that might actually have a query--typically requests ending in either / or whatever extension you use.

that's what i meant by this:
using a RewriteRule directive to redirect the cgi-bin/yabb/YaBB.pl path