Forum Moderators: phranque

Message Too Old, No Replies

Strip Query String - too many redirects for WordPress

         

2011R2d2

4:43 pm on Sep 18, 2011 (gmt 0)

10+ Year Member



Hi All, I'm new here.

I have spent months reading all that is Apache and regular expressions still go completely over my head as far as three solutions, or a combination, I need.

Two hard drive crashes later, and I have lost all my research over the months I've been struggling with this. I would have posted separate threads, but was unsure if I'd be able to put it all together afterwards.

As you can imagine, Feedburner tries to redirect urls (?utm_source=), any SEO plugin tries to direct urls, and WordPress does its own redirecting of old permalink structures and also in cases of omitted trailing slashes.

I have used the solution below with amazing effectiveness but it broke my WordPress Admin and any 'directory', file or post attachment (?attachment_id=12) that unfortunately get requested on a query string basis.

RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]


All I need is a single reference URL for all posts, but if I didn't end up with broken pieces, then I ended up with too many redirects. I'm at a loss.

This should be the end result: http://www.example.com/this-is-my-post-title/

Please be kind enough to assist - I'm trapped in Apache Purgatory :-(

These are my problems (in order of priority):

1. This is my major mightmare after two mobile plugins that based their code on each other. Deleted and database cleared. I do sit whith what is more than quadruple content in certain cases, since some URLs have one of each of the cases stated!

This should be the end result: http://www.example.com/this-is-my-post-title/

http://www.example.com/this-is-my-post-title/?mobile_switch=mobile
http://www.example.com/this-is-my-post-title/?mobile_switch=desktop
http://www.example.com/this-is-my-post-title/?wpmp_switcher=mobile
http://www.example.com/this-is-my-post-title/?wpmp_switcher=desktop
http://www.example.com/this-is-my-post-title/?wpmp_switcher=mobile&mobile_switch=mobile


2. I no longer use a date-based archive, since 2009, but there seems to be numerous rogue URLs out there.

This should be the end result: http://www.example.com/this-is-my-post-title/

http://www.example.com/2009/
http://www.example.com/2009/02/
http://www.example.com/2009/02/25/this-is-my-post-title/


3. I have a particular challenge with Yandex insisting on using the format below although it has not been used since 2009. Yandex often crawls and completely ignores the /%postname%/ structure and Yandex is also the only entity that requests this format, to my great irritation.

This should be the end result:
http://www.example.com/this-is-my-post-title/ or in the very least,
http://www.example.com/ or perhaps even http://www.example.com/blog/, if that is more practical (which I doubt), given other redirects.

http://www.example.com/blog/?p=130


WordPress, for now, seems to be a cameleon on a M&M packet with all my fumbling :-(

I really, really, really tried my best on my own, which just got me a -21.3% on Google Analytics currently.

Please, please, please, ANY assistance that doesn't require more Apache reading of the same things elsewhere I've been reading and not understanding for a year, would be greatly appreciated.

g1smd

4:53 pm on Sep 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Instead of checking QUERY_STRING, check THE_REQUEST instead. You will then redirect ONLY external incoming requests for URLs with appended query strings.

If you test QUERY_STRING then you will also be redirecting previously rewritten requests and exposing internal filepaths back on to the web as new URLs. In most cases this usually leads to an infinite redirect loop.

Additionally, add a preceding RewriteCond that simply excludes any requests that should never be redirected.

2011R2d2

5:16 pm on Sep 18, 2011 (gmt 0)

10+ Year Member



What I hear you saying is try this?:

RewriteCond %{THE_REQUEST} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

This would sort all the mobile-related strings,
not break file attachments (e.g.) http://www.example.com/?attachment_id=12,
not break WordPress Admin,
and will also sort http://www.example.com/blog/?p=130
to rewrite to http://www.example.com/blog/...with the trailing slash?

g1smd

5:24 pm on Sep 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The RegEx pattern
%{THE_REQUEST} .
will always evaluate as TRUE.

THE_REQUEST is looking at the literal
GET /somepath/somefile?somequery HTTP/1.1
request sent by the browser.

You'll need a pattern that begins
^[A-Z]{3,9}\ /
and looks for the presence of a query string.

2011R2d2

5:43 pm on Sep 18, 2011 (gmt 0)

10+ Year Member



Thank you so much for your help thusfar! I really am humbled by your knowledge.

OK, so the literal browser request now makes sense.

The pattern-part is what is getting me stuck, and has to date, not for a lack of trying.

OK, so this is more like it yet won't get me what I want (I don't know):

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/

?

g1smd

5:49 pm on Sep 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That pattern looks for GET (or POST or HEAD or any directive between 3 and 9 characters long), followed by space, followed by slash then any number of folders (including none) followed by index.php then a space then HTTP/1.1.

That RegEx pattern will never match any request with a query string as there is nowhere in the pattern to match a literal question mark followed by the characters in the appended query string.

There's no-one here with the available free time to teach you Regular Expressions. Some reading of the Apache manual and of a concise RegEx pattern matching tutorial appears to be in order. :)

2011R2d2

6:27 pm on Sep 18, 2011 (gmt 0)

10+ Year Member



Thanks, g1smd, your help is really appreciated. It does seem sad though that I'd chosen to read many posts on this forum and on Google without much success, then seem to find a helpful person, just to be disappointed.

I understand the following:
^[A-Z]{3,9}\ matches from 3 to 9 occurences of any uppercase letter (eg 'GET') followed by an \ escaped space.

/([^/]+/)* matches a forward slash followed by any quantity of [one or more characters not preceded by a forward slash but ending with a forward slash], eg '/subfolder1/subfolder2/'

C'mon...don't give up on me - I'm sure you also battled a bit when you started off : )

One question...

This has been suggested but can I trust it to work?

<IfModule mod_rewrite.c>
RewriteEngine on
## Strip Mobile Pack query strings ##
RewriteCond %{QUERY_STRING} (.+)wpmp_.*$ [NC, OR]
RewriteCond %{QUERY_STRING} (.+)mobile_.*$ [NC]
RewriteRule ^(.*)$ $1?%1 [R=301,L]
</IfModule>

I have not found it to work effectively, hence asking for specific help from mod_rewrite educated people that would be in the know, other than noobs like me.

lucy24

8:48 pm on Sep 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That RegEx pattern will never match any request with a query string as there is nowhere in the pattern to match a literal question mark followed by the characters in the appended query string.

Did you mean that this specific pattern won't match (which I can see), or did you mean that {THE_REQUEST}, like most things in mod_rewrite-land, doesn't see the query string even if you tell it to expect one by working a \? into the pattern somewhere?

g1smd

9:08 pm on Sep 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



THE_REQUEST contains the literal request sent by the browser. That includes the path, file, and query string. Use the Live HTTP Headers extension for Firefox to see the actual request.

The pattern you need will include \? for the literal question mark and provision for one or more parameter=value pairs each separated by a literal & symbol before the literal space and the HTTP/ version information.

[edited by: g1smd at 9:13 pm (utc) on Sep 18, 2011]

g1smd

9:11 pm on Sep 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<IfModule mod_rewrite.c>
RewriteEngine on
## Strip Mobile Pack query strings ##
RewriteCond %{QUERY_STRING} (.+)wpmp_.*$ [NC, OR]
RewriteCond %{QUERY_STRING} (.+)mobile_.*$ [NC]
RewriteRule ^(.*)$ $1?%1 [R=301,L]
</IfModule>

Whatever it does, it is badly written.

RewriteEngine on
## Strip Mobile Pack query strings ##
RewriteCond %{QUERY_STRING} ^(.+)(mobile|wpmp)_. [NC]
RewriteRule (.*) http://www.example.com/$1?%1 [R=301,L]


but the leading (.+) pattern is still not the best way to be doing things.

2011R2d2

9:26 pm on Sep 18, 2011 (gmt 0)

10+ Year Member



Would this work? I'm really desperate!

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteCond %{query_string} ^wpmp_.*$ [NC,OR]
RewriteCond %{query_string} ^mobile_.*$ [NC]
RewriteRule ^(.*)$ [mysite.com...] [R=301,L]

since I don't have .html extensions the original below won't work

RewriteRule \.html$ [mysite.com...] [R=301,L]

2011R2d2

9:40 pm on Sep 18, 2011 (gmt 0)

10+ Year Member



@g1smd, THANKS! YES! That 'feels' right!

The suggestion came from this very forum although I only have the say-so from the person who posted it on a blog.

Sorry, I posted the other suggestion before I saw your message.

The other one would kinda get the drift right but the RewriteRule 'feels' wonky.

2011R2d2

9:45 pm on Sep 18, 2011 (gmt 0)

10+ Year Member



Errrm... that should read:

RewriteRule ^(.*)$ http://www.example.com/%1? [R=301,L]

and

RewriteRule \.html$ http://www.example.com/%1? [R=301,L]

g1smd

10:07 pm on Sep 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The ^ tag means "begins with". Does the query string actually "begin" with the characters you have so tagged?

Notice the use of the "local or" method using the
(this|that)
format to condense two lines into one. It is recommended.

Notice the elimination of the trailing uncaptured unused .*$ part of the pattern. It increases the code efficiency and is recommended.

lucy24

10:09 pm on Sep 18, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



overlapping g1 as usual

RewriteCond %{query_string} ^wpmp_.*$ [NC,OR]
RewriteCond %{query_string} ^mobile_.*$ [NC]

You only need the anchors if you're capturing the part after what you're looking for. Since you're not, you can stop at the _ The opening anchor may or may not be necessary, depending on what's going on.

Are you really combining default [AND] with explicit [OR]? You're a braver man than I ;)

You could also collapse the two into

RewriteCond %{query_string} ^(wpmp|mobile)_ [NC]


g1, I actually came back to say "Cancel that question!" after finding that a carefully worked-out rewrite didn't work... leading to a "D'oh!" moment as I realized that the text I was looking for wasn't in my query string, it was in the referer's query string. And yes indeed, it worked fine. Since I was looking for two different pieces I said

RewriteCond %{HTTP_REFERER} {blahblah}
RewriteCond %{HTTP_REFERER} {otherblahblah}

rather than

RewriteCond %{HTTP_REFERER} {blahblah}.+{otherblahblah}

working on a vague idea that this would be just as fast if not faster.

We now return to our regularly scheduled thread.

2011R2d2

10:29 pm on Sep 18, 2011 (gmt 0)

10+ Year Member



@g1smd,

Yip, the strings start like so (in context):
http://www.example.com/this-is-my-post-title/?mobile_switch=mobile

Yes, I noticed 'uncaptured unused .*$ part of the pattern' - that's why or the first time it felt so RIGHT! :))

@lucy24,

Yip, it's the anchor-thing that had me in a twist. It just looked so wrong but I didn't trust my noob intuition.

Many, many, many thanks to both of you. At 29 after midnight here where I am, it's a peaceful sleep at last.

2011R2d2

5:14 am on Sep 19, 2011 (gmt 0)

10+ Year Member



Morning!

Test Feedback:

The query string is not being stripped. Whether I type it into the browser or whether I click on an indexed link.

I know a ? removes the string but does the ? in '/$1?%1' not refer specifically to the trailing forward slash in '.com/' only? If I'm expressing myself clearly?

lucy24

6:27 am on Sep 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another clarification question: do you want to strip (get rid of) the entire query string, or just the bit about "mobile"?

I know a ? removes the string but does the ? in '/$1?%1' not refer specifically to the trailing forward slash in '.com/' only? If I'm expressing myself clearly?

Yes, you are, and no, it doesn't. The "substitution" deals in literal text. Any captured text from earlier is used whole and intact. You'll notice that the substitution also doesn't need to escape its periods. This is standard in Regular Expressions, not specific to Apache.

Your original Rewrite had:

RewriteCond %{QUERY_STRING} (.+)wpmp_.*$ [NC, OR]
RewriteCond %{QUERY_STRING} (.+)mobile_.*$ [NC]
RewriteRule ^(.*)$ $1?%1 [R=301,L]


This can't possibly be what you intended, because it means "keep the original non-query part of the request, and replace the existing query with the part that came before 'wpmp_' only." Each set of parentheses is separately numbered, whether or not it ends up capturing anything. And if nothing comes before 'wpmp_' or 'mobile_' the Condition will fail, because it requires at least one character: .+ rather than .*

g1's cleaned-up version had:

RewriteCond %{QUERY_STRING} ^(.+)(mobile|wpmp)_. [NC]
RewriteRule (.*) http://www.example.com/$1?%1 [R=301,L]


This does the same thing, but again only does it if the "mobile" part is not at the very beginning of the query.

Your most recent quoted version has:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteCond %{QUERY_STRING} ^(wpmp|mobile)_ [NC]

RewriteRule . http://www.example.com/%1? [R=301,L]


I collapsed the last two Conditions, as noted elsewhere in this thread, and reduced the "pattern" to . because you're not capturing it.

In English, this means:

IF your request comes in from outside containing a query string (in which case, capture the non-query part)
AND IF the query string begins with the elements "wpmp" or "mobile"

THEN redirect the user to a new, query-less page whose address is everything that came before the query.

Hm. I wonder if it would run cleaner if you said

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.
RewriteCond %{THE_REQUEST} \?
RewriteCond %{QUERY_STRING} ^(wpmp|mobile)_ [NC]

RewriteRule (.*) http://www.example.com/$1? [R=301,L]


You might think that all you need to look for is a request beginning [A-Z]{3,9}\ with a literal (escaped) space at the end. If you try it, you will soon find 500 reasons hahaha why you can't do this. Voice of experience. So tack on a slash, and then an "any old character" for good measure. And then, separately, see whether the request contains a query. (Advance caution: if g1 says I am delirious and this is a terrible idea, listen to him.)

Since you have to put something in the "pattern" part of the Rule, you may as well put the whole thing, and capture it for reuse.

Oh yes and... your server may or may not be case-sensitive. But it's good to get in the habit of assuming it is. Same principle as putting all your html in lower-case even though browsers can (by necessity) deal with every kind of mixed-case mess.