Forum Moderators: phranque

Message Too Old, No Replies

Strip Query String, Redirect URLs to Self

         

ichthyous

7:42 pm on Aug 29, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hello,

I am trying to strip the query string from the end of various URLs and redirect to the URL itself...for example:
example.com/photo/page-title-abc/?g2_navId=xff471fbb
redirects to:
example.com/photo/page-title-abc/

Since there are hundreds of pages, all with different titles, and also the query string contains a random set of nine letters and numbers in I am not sure how to handle this, but perhaps it's a lot simpler than I think. Would this be RedirectMatch?

I have no idea why suddenly Google is reporting these urls with query strings appended to the end. This type of query string hasn't been used on my site since 2016!

Thanks

lucy24

9:18 pm on Aug 29, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can't look at query strings in mod_alias (Redirect and RedirectMatch). It can only be done in mod_rewrite. This, in turn, may mean that you have to change all your existing redirects to use mod_rewrite syntax--it's very easy--so that everything happens in the right order. Besides, you already use mod_rewrite for canonicalization. Er. Ahem. Don't you?

I have a rule similar to what you describe. (Actually I've got two rules. One of them flat-out blocks any request with a query other than fbclid, as that's the only query used by legitimate humans. But we'll just talk about the redirect.)

RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} ^/(.*)
RewriteRule (^|/|\.html)$ https://example.com/%1? [R=301,L]
Line by line, this means:

-- pattern of RewriteRule: only evaluate the conditions if the request is for a page (URL ending in / or .html on this site).
-- first RewriteCond: see if there's any kind of query string
-- second RewriteCond, only deployed if the first one is met: capture the requested URI
-- target of RewriteRule: redirect to the URI alone, stripping away the query string

I do it this way--two steps forward, one back--because the vast majority of requests will not have a query string, so there's no point in doing the work of capturing. If and only if there turns out to be a query, then you capture the URI for re-use.

I've noticed recently that Google is dredging up some ancient URLs, in some cases going back as far as 2011. It's probably just routine housecleaning. But it does illustrate the point that once you've got a redirect in place, it needs to stay there forever.

ichthyous

11:38 pm on Aug 29, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the code Lucy, it works perfectly...however (and there's always a however!)...I do need some query string to not redirect. This is a wordpress site, so the search stopped working because it uses ?s= for search queries and is now redirecting to home page. It also uses ?p=19726, for example to refer to post ID. Is there a way of saying redirect all query strings, except for these two? Thanks again!

lucy24

12:17 am on Aug 30, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You'll need an additional RewriteCond, placed between the two in my example. (That is: #1 check if there is a query, #2 check that it is not p or s, #3 capture the URI.)

RewriteCond %{QUERY_STRING} !\b[ps]=

meaning "Query String IS NOT p or s". The \b means "word boundary"; in combination with the = it ensures that you're only looking at the single letters p or s, not "lookup=" or "options=" or "size=" or, or, or et cetera.

Better yet, if p or s is always the first thing in the query string, replace \b with the anchor ^ as in

RewriteCond %{QUERY_STRING} !^[ps]=

ichthyous

4:09 am on Aug 30, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This worked just fine...thanks so much for your help!