homepage Welcome to WebmasterWorld Guest from 54.205.189.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
htaccess - use of ? to dupe rewrite rule
hexed




msg:4523140
 9:50 am on Nov 28, 2012 (gmt 0)

Hi,
I wonder if someone can help me with some rewrite rules. In order to avoid rewrite rules messing with 301 redirects I've been adding: ? to the end of 301 statements (see below), this seems to work well although I've been requested to remove it by the SEO working on the site. If I remove the ? then the resulting URL is as follows:

http://www.domain.co.uk/blog/123/?q=blog/blog/123

Which results in a 404.

So I need to:

1. Remove ?
2. Make redirects from urls with slash at the end to urls without slash

Thanks for any help.

Al


Current .htaccess file:

Options +FollowSymlinks
RewriteEngine On
RewriteBase /

RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.domain\.co.uk [NC]
RewriteRule (.*) http://www.domain.co.uk/$1 [R=301,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

DirectoryIndex index.php

#301 redirects
redirect 301 /blog/abc http://www.domain.co.uk/blog/123/?
redirect 301 /blog/def http://www.domain.co.uk/456/?

 

g1smd




msg:4523142
 10:04 am on Nov 28, 2012 (gmt 0)

Never mix Redirect (mod_alias) and RewriteRule (mod_rewrite) in the same site.

Use RewriteRule for all of your rules. You'll need to convert Redirect to RewriteRule.

List all your single page and single folder redirects before the non-www/www redirect.

Escape literal periods in patterns. Replace:
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.domain\.co.uk [NC]

with
RewriteCond %{HTTP_HOST} !^(www\.domain\.co\.uk)?$

When used with a RewriteRule that redirects, an appended question mark removes the query string data from the target URL.

lucy24




msg:4523173
 12:23 pm on Nov 28, 2012 (gmt 0)

Within mod_rewrite a question mark ? can mean about half a dozen different things. It is better if you explain in English what you are trying to do. And then we* can figure out the wording of the rule(s) that will get you there.


* That's, ahem, nos, not nosotros.** Very important principle in this forum.
** Depending on dialect, unfortunately, so this may not make sense in some countries.

hexed




msg:4523180
 12:35 pm on Nov 28, 2012 (gmt 0)

Thanks both.

Lucy - the use of the ? on the 301 rules is to prevent this rule:

RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

...from altering the 301 redirected address. If I remove the ? from the 301 statements then apache produces a url like this:

http://www.domain.co.uk/blog/123/?q=blog/blog/123 instead of the intended http://www.domain.co.uk/blog/123

This gives a 404 error. The use of the ? is simply a workaround to prevent the rewrite rule from appending
?q=blog/blog/123
g1smd




msg:4523249
 3:34 pm on Nov 28, 2012 (gmt 0)

That's a kludge and not guaranteed to stay working.

That rule is an internal rewrite. If there is a redirect happening after that, one that exposes the previously rewritten path then it is likely that you have rules in the wrong order or you have rules that interfere with each other because they are from two different modules. The rules in the htaccess file are parsed in "per module" order and so you MUST convert all of your mod_alias rules to instead use mod_rewrite and you MUST fix the rule order as described above.

lucy24




msg:4523361
 9:36 pm on Nov 28, 2012 (gmt 0)

See, that's what I meant about explaining in English. So what you want to do is redirect to an URL without query, and then rewrite to something that uses this URL as the new query? If so you don't need QSA at all: its job is to reappend any pre-existing query if-and-only-if you've added a new one. But at this point there should be no pre-existing query to (re)append. Unless you're going into an infinite loop: blahblah.php?q=something&q=something&q=something ... or worse.

Search this subforum for some boilerplate about the redirect-to-rewrite two-step. It's one of the most commonly asked questions.

When it is not your own server, you have no control over which mod executes first. So the moment you have one rule requiring mod_rewrite, you need to change all your existing mod_alias rules (Redirect by that name) to use mod_rewrite instead. It is the only way to be certain things execute in the order intended.

hexed




msg:4523393
 11:59 pm on Nov 28, 2012 (gmt 0)

Thanks for all your help and guidance. This is my working .htaccess file:

Options +FollowSymlinks
RewriteEngine On
RewriteBase /

RewriteRule ^blog/abc$ http://www.domain.co.uk/blog/def? [R=301,L]

# non www to www redirect
RewriteCond %{HTTP_HOST} !^(www\.domain\.co\.uk)?$
RewriteRule (.*) http://www.domain.co.uk/$1 [R=301,L]

#remove trailing slash
RewriteRule ^(.+)/$ /$1 [R=301,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

DirectoryIndex index.php

g1smd




msg:4523402
 12:54 am on Nov 29, 2012 (gmt 0)

See that non-www requests with slash go through a double redirect to change to www and then remove the slash.

The non-www/www canonicalisation redirect must be the last redirect before the rewrites.

Every rule that redirects must include the canonical hostname in the rule target.

Those two things will ensure that non-canonical requests do not generate an unwanted redirection chain.

lucy24




msg:4523413
 2:06 am on Nov 29, 2012 (gmt 0)

RewriteRule ^(.+)/$ /$1 [R=301,L]

What if the request is for a bona fide directory?

Your current ruleset is pretty much upside-down.

hexed




msg:4523493
 9:08 am on Nov 29, 2012 (gmt 0)

@g1smd - thanks, sorry I didn't follow you meaning here, would you be able to provide an example?

@lucy24 - I have one additional directory on this CMS powered site and this directory has a .htaccess file that switches the rewrite engine off. What would you recommend as a more robust way to remove trailing slashes? (The CMS creates extension-less file names)

g1smd




msg:4523501
 9:40 am on Nov 29, 2012 (gmt 0)

Rules with [R=301,L] are redirects. Rules with [L] are internal rewrites. The non-www/www canonicalisation redirect must be the very last redirect before the rewrites.

Every rule that redirects must include the canonical hostname in the rule target. So /$1 [R=301,L] should be http://www.example.com/$1 [R=301,L].

Add a preceding RewriteCond to the "remove slash" redirect rule that checks that %{REQUEST_URI} is NOT a folder using the !-d test using exactly the same code as in the !-d test in the rewrite.

What example are you asking about?

lucy24




msg:4523503
 9:43 am on Nov 29, 2012 (gmt 0)

this directory has a .htaccess file that switches the rewrite engine off

?
The RewriteEngine is different from most things in Apache because it isn't inherited. Each individual htaccess files that uses mod_rewrite has to turn it on explicitly. Otherwise it's off.

Your with/without www redirect should be the very last redirect. On a vanilla html site, the second-to-last would be the rule to get rid of "index.html". The idea is to go from most specific to most general, so each rule only works on those requests that haven't already been handled in an earlier rule. Otherwise you could get multiple redirects.

If your whole site is built around a CMS so there are no "real" user-accessible directories, the trailing slash redirect is OK-- but make sure you haven't locked yourself out of any physical directories that you might need to visit! But the redirect shouldn't be necesssary in any case, since you don't have to redirect links that arrive in the wrong form.

How many URLs are involved? It may be safer to list them individually for the trailing-slash redirect. Use a pipe-separated list like

RewriteRule ^(preliminaryblahblah/(oneURL|notherURL|thirdURL))/$ http://www.example.com/$1

Unless there are potentially hundreds of 'em. You want to avoid redirecting to a 404 if possible.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved