Welcome to WebmasterWorld Guest from 54.145.23.244

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Removing querystrings from Wordpress

Using the .htaccess

     
11:59 am on Mar 28, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 8, 2003
posts: 512
votes: 4


I need to strip all incoming links with the queries ?page= and ?p=
So for eg - www.example.com/page/2?page=13 will be redirected to www.example.com/page/2
or
www.example.com/furniture?p=2 will be redirected to www.example.com/furniture

basically the query should be stripped off and the user be redirected using a 301 to the stripped url.

This is my current .htaccess -
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

Redirect /atom.xml http://example.com/feed/atom/

RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
RewriteCond %{QUERY_STRING} ^(.*&)?page=
RewriteRule ^(.*)$ $1?%1 [R=301]


</IfModule>

The highlighted code does the redirection but it only works in the domain directory.
www.example.com/?page=2 is redirected to www.example.com/
but www.example.com/page/2?page=2 is not redirected to www.example.com/page/2

I need to make the redirection work in all the directories. Also would it be possible to include a wildcard ahead of "?" which would strip all the queries and redirect them?

[edited by: engine at 12:58 pm (utc) on Mar 28, 2013]
[edit reason] please use example.com [/edit]

12:11 pm on Mar 28, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Every RewriteRule needs the L flag.

RewriteRules invoking a redirect must be listed before RewriteRules that invoke a rewrite.

Redirects must include the protocol and canonical hostname in the redirect target.

You should test THE_REQUEST rather than QUERY_STRING in the RewriteCond otherwise in certain circumstances some requests may lead to an infinite loop.

Never mix Redirect and RewriteRule in the same site. Convert all Redirect directives to use RewriteRule with [R=301,L] flags.

Dump the IfModule container tags. They are not needed.

Never use (.*) at the beginning or in the middle of a RegEx pattern. (.*) means "match the rest of the string to the end", and can therefore only be used at the end of a RegEx pattern. Use a more specific match here.

Is the page= parameter always the ONLY parameter?

If other parameters are requested at the same time as the page= parameter should they stripped or retained?

Append a question mark to the rule target to prevent re-attachment of the originally requested parameters.

Use example.com in this forum to prevent URL auto-linking.
Tick the 'disable smilies in this post' option to make the code readable.
12:24 pm on Mar 28, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 8, 2003
posts: 512
votes: 4


Two queries need to be stripped -
page=parameter and p=parameter.
Other queries can be retained.

Pardon my ignorance but I am not all acquainted with .htaccess.

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

</IfModule>
# END WordPress

This is the standard .htaccess that is generated by Wordpress.

According to you what would be the most efficient way of stripping the query? A redirection or rewriting.

How can I pass the rules in the file keeping the current rules intact?
1:10 pm on Mar 28, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


See the list of changes in the post above.


According to you what would be the most efficient way of stripping the query? A redirection or rewriting.

A redirect is needed. This tells a user asking for one URL to make a new request for a different URL. URLs are used "out there" on the web.

Rewriting has no effect on URLs. A rewrite merely alters the internal location used to service a particular request.
1:33 pm on Mar 28, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 8, 2003
posts: 512
votes: 4


You mentioned
Never use (.*) at the beginning or in the middle of a RegEx pattern. (.*) means "match the rest of the string to the end", and can therefore only be used at the end of a RegEx pattern. Use a more specific match here.


But without using a .* how can I convey that it is a query.

Can you help me with suggesting the conditions I need to pass in the .htaccess file?
10:49 pm on Mar 28, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:12994
votes: 287


But without using a .* how can I convey that it is a query.

?
There is no relationship between the random string .* and the literal character ? I assume g1's comment was addressed at this specific line:

RewriteCond %{QUERY_STRING} ^(.*&)?page= 


I think what you are trying to say here is: there might be other queries before the "page=" or "p=" query that you are trying to get rid of. But there will never be any further queries after it.

What you need is something closely analogous to the format you use when capturing nested directory names. Here it would look something like
^((?:[^&]+&)*)p(?:age)?=

The non-capture elements ?: are not strictly necessary, but it's a good habit when you are using multiple parentheses in a single line.

Do any of your other queries begin with p? If not, you can shave a further bit of time by expressing the inmost package as
[^p][^&]+&
Replace + with * if some of your other queries are only one letter. I don't know whether you need to code for malformed query strings that contain consecutive &&. There is almost no limit to the forms a bad URL can have, but you don't always need to code for all of them.

Will any given query string ever contain both "page" and "p" or are they mutually exclusive?

www.example.com/?page=2 is redirected to www.example.com/

Are you sure? Does the redirect still take place if you explicitly type in an URL containing "/index.php"?
12:28 pm on Mar 29, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 8, 2003
posts: 512
votes: 4


The redirect does take place if the case of a filename as well. But it does not work for any directory outside the root. For eg - it will work on www.example.com/games/sony-psp.php?page=2 but it wont work on www.example.com/games?page=2

The queries would be mixed as well some would include - index.php?p=10&page=142 for which I would need to strip both of them.
1:25 pm on Mar 29, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Whan a request includes both page= and p= do you want to strip both those AND all others, or will you need to retain those others?
2:35 pm on Mar 29, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 8, 2003
posts: 512
votes: 4


I would want to strip both of them. No query should be retained.

This should work in tandem with the rules set by WordPress as well.
8:49 pm on Mar 29, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:12994
votes: 287


I would want to strip both of them. No query should be retained.

Now you're saying two different things. Apart from the p(age) element: what about the other queries, if any?

example.com/directory/filename.php?a=123&page=456&b=789


There are potentially three separate captures:

{stuff before first p(age)=\d+}
p(age)=\d+
{stuff in between}
p(age)=\d+
{stuff after p(age)=\d+}

Let us not consider further malformed query strings that have duplicate occurrences of the same thing. Cue Tolstoy paraphrase here.

From first post:
RewriteRule ^index\.php$ - [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

RewriteCond %{QUERY_STRING} ^(.*&)?page=
RewriteRule ^(.*)$ $1?%1 [R=301]

#1 any request for "index.php" is left unchanged, regardless of whether it was an external request (hence my earlier question about deliberately typing in "index.php") or an internal request resulting from a rewrite

#2 any request for a nonexistent file regardless of format* is rewritten to index.php, and then Apache cycles through all the mods again from the top

#3 any request with "page=" in the query string is redirected to the same request minus the part of the query string beginning with the last occurrence of "page=", unless the requested page is called "index.php" so it would never reach this rule.

Within each module, rules execute in order unless you do fancy footwork involving skips and repeats.


* This is standard CMS behavior and I can't for the life of me understand why it is supposed to be a good idea. But then, I don't speak Apache.
8:16 am on Mar 30, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 8, 2003
posts: 512
votes: 4


Ok, to simplify it how about a rule which strips any and all queries.

Lucy24 is it fine if I share the working url with you?
7:06 pm on Mar 30, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:12994
votes: 287


You can share anything you like so long as you express the domain name as example.com. Or .org or .uk or dot anything else if you're talking about multiple domains. Unfortunately it doesn't work with subdomains.

If you want to dump any query strings that contain the parameter
p(age)?
the whole thing definitely gets easier. It may be useful to backtrack a little and say where the queries are coming from. If it's some limited number of outdated links or bookmarks, you may be able to target the rule more narrowly.
9:51 am on Apr 1, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:Oct 8, 2003
posts: 512
votes: 4


The queries are coming from cached google pages from our old CMS. Which used to paginate in the fashion of www.example.com/index,php?page=2 and so forth. I have tried a lot to remove the pages from Google's cache from the webmaster tools, use explicit noindex rules in the robots.txt yet they remain. So I think the best way would be to do a 301 using htaccess.

I have pm'd you my website url so you can get a better idea.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members