Welcome to WebmasterWorld Guest from 54.161.187.250

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Removing querystrings from Wordpress

Using the .htaccess

   
11:59 am on Mar 28, 2013 (gmt 0)

10+ Year Member



I need to strip all incoming links with the queries ?page= and ?p=
So for eg - www.example.com/page/2?page=13 will be redirected to www.example.com/page/2
or
www.example.com/furniture?p=2 will be redirected to www.example.com/furniture

basically the query should be stripped off and the user be redirected using a 301 to the stripped url.

This is my current .htaccess -
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

Redirect /atom.xml http://example.com/feed/atom/

RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
RewriteCond %{QUERY_STRING} ^(.*&)?page=
RewriteRule ^(.*)$ $1?%1 [R=301]


</IfModule>

The highlighted code does the redirection but it only works in the domain directory.
www.example.com/?page=2 is redirected to www.example.com/
but www.example.com/page/2?page=2 is not redirected to www.example.com/page/2

I need to make the redirection work in all the directories. Also would it be possible to include a wildcard ahead of "?" which would strip all the queries and redirect them?

[edited by: engine at 12:58 pm (utc) on Mar 28, 2013]
[edit reason] please use example.com [/edit]

12:11 pm on Mar 28, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Every RewriteRule needs the L flag.

RewriteRules invoking a redirect must be listed before RewriteRules that invoke a rewrite.

Redirects must include the protocol and canonical hostname in the redirect target.

You should test THE_REQUEST rather than QUERY_STRING in the RewriteCond otherwise in certain circumstances some requests may lead to an infinite loop.

Never mix Redirect and RewriteRule in the same site. Convert all Redirect directives to use RewriteRule with [R=301,L] flags.

Dump the IfModule container tags. They are not needed.

Never use (.*) at the beginning or in the middle of a RegEx pattern. (.*) means "match the rest of the string to the end", and can therefore only be used at the end of a RegEx pattern. Use a more specific match here.

Is the page= parameter always the ONLY parameter?

If other parameters are requested at the same time as the page= parameter should they stripped or retained?

Append a question mark to the rule target to prevent re-attachment of the originally requested parameters.

Use example.com in this forum to prevent URL auto-linking.
Tick the 'disable smilies in this post' option to make the code readable.
12:24 pm on Mar 28, 2013 (gmt 0)

10+ Year Member



Two queries need to be stripped -
page=parameter and p=parameter.
Other queries can be retained.

Pardon my ignorance but I am not all acquainted with .htaccess.

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

</IfModule>
# END WordPress

This is the standard .htaccess that is generated by Wordpress.

According to you what would be the most efficient way of stripping the query? A redirection or rewriting.

How can I pass the rules in the file keeping the current rules intact?
1:10 pm on Mar 28, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



See the list of changes in the post above.


According to you what would be the most efficient way of stripping the query? A redirection or rewriting.

A redirect is needed. This tells a user asking for one URL to make a new request for a different URL. URLs are used "out there" on the web.

Rewriting has no effect on URLs. A rewrite merely alters the internal location used to service a particular request.
1:33 pm on Mar 28, 2013 (gmt 0)

10+ Year Member



You mentioned
Never use (.*) at the beginning or in the middle of a RegEx pattern. (.*) means "match the rest of the string to the end", and can therefore only be used at the end of a RegEx pattern. Use a more specific match here.


But without using a .* how can I convey that it is a query.

Can you help me with suggesting the conditions I need to pass in the .htaccess file?
10:49 pm on Mar 28, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



But without using a .* how can I convey that it is a query.

?
There is no relationship between the random string .* and the literal character ? I assume g1's comment was addressed at this specific line:

RewriteCond %{QUERY_STRING} ^(.*&)?page= 


I think what you are trying to say here is: there might be other queries before the "page=" or "p=" query that you are trying to get rid of. But there will never be any further queries after it.

What you need is something closely analogous to the format you use when capturing nested directory names. Here it would look something like
^((?:[^&]+&)*)p(?:age)?=

The non-capture elements ?: are not strictly necessary, but it's a good habit when you are using multiple parentheses in a single line.

Do any of your other queries begin with p? If not, you can shave a further bit of time by expressing the inmost package as
[^p][^&]+&
Replace + with * if some of your other queries are only one letter. I don't know whether you need to code for malformed query strings that contain consecutive &&. There is almost no limit to the forms a bad URL can have, but you don't always need to code for all of them.

Will any given query string ever contain both "page" and "p" or are they mutually exclusive?

www.example.com/?page=2 is redirected to www.example.com/

Are you sure? Does the redirect still take place if you explicitly type in an URL containing "/index.php"?
12:28 pm on Mar 29, 2013 (gmt 0)

10+ Year Member



The redirect does take place if the case of a filename as well. But it does not work for any directory outside the root. For eg - it will work on www.example.com/games/sony-psp.php?page=2 but it wont work on www.example.com/games?page=2

The queries would be mixed as well some would include - index.php?p=10&page=142 for which I would need to strip both of them.
1:25 pm on Mar 29, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Whan a request includes both page= and p= do you want to strip both those AND all others, or will you need to retain those others?
2:35 pm on Mar 29, 2013 (gmt 0)

10+ Year Member



I would want to strip both of them. No query should be retained.

This should work in tandem with the rules set by WordPress as well.
8:49 pm on Mar 29, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



I would want to strip both of them. No query should be retained.

Now you're saying two different things. Apart from the p(age) element: what about the other queries, if any?

example.com/directory/filename.php?a=123&page=456&b=789


There are potentially three separate captures:

{stuff before first p(age)=\d+}
p(age)=\d+
{stuff in between}
p(age)=\d+
{stuff after p(age)=\d+}

Let us not consider further malformed query strings that have duplicate occurrences of the same thing. Cue Tolstoy paraphrase here.

From first post:
RewriteRule ^index\.php$ - [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

RewriteCond %{QUERY_STRING} ^(.*&)?page=
RewriteRule ^(.*)$ $1?%1 [R=301]

#1 any request for "index.php" is left unchanged, regardless of whether it was an external request (hence my earlier question about deliberately typing in "index.php") or an internal request resulting from a rewrite

#2 any request for a nonexistent file regardless of format* is rewritten to index.php, and then Apache cycles through all the mods again from the top

#3 any request with "page=" in the query string is redirected to the same request minus the part of the query string beginning with the last occurrence of "page=", unless the requested page is called "index.php" so it would never reach this rule.

Within each module, rules execute in order unless you do fancy footwork involving skips and repeats.


* This is standard CMS behavior and I can't for the life of me understand why it is supposed to be a good idea. But then, I don't speak Apache.
8:16 am on Mar 30, 2013 (gmt 0)

10+ Year Member



Ok, to simplify it how about a rule which strips any and all queries.

Lucy24 is it fine if I share the working url with you?
7:06 pm on Mar 30, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You can share anything you like so long as you express the domain name as example.com. Or .org or .uk or dot anything else if you're talking about multiple domains. Unfortunately it doesn't work with subdomains.

If you want to dump any query strings that contain the parameter
p(age)?
the whole thing definitely gets easier. It may be useful to backtrack a little and say where the queries are coming from. If it's some limited number of outdated links or bookmarks, you may be able to target the rule more narrowly.
9:51 am on Apr 1, 2013 (gmt 0)

10+ Year Member



The queries are coming from cached google pages from our old CMS. Which used to paginate in the fashion of www.example.com/index,php?page=2 and so forth. I have tried a lot to remove the pages from Google's cache from the webmaster tools, use explicit noindex rules in the robots.txt yet they remain. So I think the best way would be to do a 301 using htaccess.

I have pm'd you my website url so you can get a better idea.