Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite combining rules

         

Mariuz

2:11 pm on May 3, 2011 (gmt 0)

10+ Year Member



Hi, I've been working on some rules for better seo navigation. Here's what I've done:

example: www.example.com/tree/branch -> www.example.com/index.php?page=tree-branch

# friendly SEO names
RewriteRule ^([a-z]+)/([a-z]+)$ http://www.example.com/index.php?page=$1-$2
RewriteRule ^([a-z]+)/([a-z]+)/$ http://www.example.com/index.php?page=$1-$2
RewriteRule ^([a-z]+)$ http://www.example.com/index.php?page=$1
RewriteRule ^([a-z]+)/$ http://www.example.com/index.php?page=$1

This works fine.

The problem comes when I want to 301 redirect the old nav
(www.example.com/index.php?page=tree-branch).
So when I add [R=301,L]:

RewriteRule ^([a-z]+)/([a-z]+)$ http://www.example.com/index.php?page=$1-$2 [R=301,L]

The url's don't get rewritten in the address bar. How come?

Does this makes any sense?
Is there something I forgot to use or is there a another/better approach?

Regards,
Mariuz

g1smd

8:15 pm on May 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have the rules exactly backwards.

You should redirect requests for URLs with parameters to the new extensionless URLs.

You should link to the extensionless URLs that you want your users to see and use. It is links that define URLs.

You then need an internal rewrite to map requests for extensionless URLs, after the link is clicked, to the internal server filepath.

Note that mod_rewrite cannot "change" your URLs.

The redirect will contain a URL with protocol and domain name in the target and use the [R=301,L] flags. It will also use a RewriteCond that tests THE_REQUEST so that only direct client requests containing parameters will be redirected.

The rewrite will NOT contain a domain name in the target, and will use only the [L] flag. It will map SEF URLs to internal filepaths with attached parameters.

If you include the domain name or the [R] flag in any rule, then the rule will be a redirect. You don't want a redirect. You need a rewrite.

Mariuz

9:30 am on May 4, 2011 (gmt 0)

10+ Year Member



Thanks, but I'm quite puzzled by what you mean exactly. Here's what I have now:

RewriteRule ^([a-z]+)/([a-z]+)$ index.php?page=$1-$2
RewriteRule ^([a-z]+)/([a-z]+)/$ index.php?page=$1-$2
RewriteRule ^([a-z]+)$ index.php?page=$1
RewriteRule ^([a-z]+)/$ index.php?page=$1

So far so good, all index.php?page=whatever will be rewrited to example.com/whatever and I can use 'whatever/' as links instead of index.php?...

But I need to 301 redirect the indexed index.php? too. So
when I use (after the rules above):

RewriteCond %{QUERY_STRING} ^page=mypage$
RewriteRule ^index\.php$ http://www.example.com/mypage? [R=301]

I'll get a 310 error: ERR_TOO_MANY_REDIRECTS


Any clues?

Mariuz

11:17 am on May 4, 2011 (gmt 0)

10+ Year Member



The above redirect causes a 310 error but my eye caught
THE_REQUEST variable you have mentioned.

Now I've got:
RewriteCond %{THE_REQUEST} /index\.php\?page=mypage\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/mypage? [R=301,L]

And it seems to work.
Will this 301 redirect work allright for search enigines? I mean I use the ? to hide the parameters.

All I need now is to create a regexp to match all the parameters.
RewriteCond %{THE_REQUEST} /index\.php\?page=([a-z]+)-([a-z]+)\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/%1/%2? [R=301,L]
RewriteCond %{THE_REQUEST} /index\.php\?page=([a-z]+)\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/%1? [R=301,L]

Is this correct?

g1smd

11:22 am on May 4, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, you must use the THE_REQUEST so that you can be sure the parameters were in the requested URL and are not there as a result of a previously rewritten internal pointer value.

If you fail to test THE_REQUEST then the code does the following:
Request arrives for example.com/12345
and is internally rewritten to /index.php?page=12345
and is externally redirected to example.com/12345
and is internally rewritten to /index.php?page=12345
and is externally redirected to example.com/12345
and is internally rewritten to /index.php?page=12345
and is externally redirected to example.com/12345
forever, because mod_rewrite has no way to know if the parameters now in the pointer are there because of previous internal rewrite or were there in the original request, UNLESS you test THE_REQUEST instead of QUERY_STRING.

There's one thing to add to the pattern for THE_REQUEST. Begin the pattern with
^[A-Z]{3,9}\ /index

to better match the literal
GET /index.php?page=345 HTTP/1.1
incoming request.

Mariuz

11:54 am on May 4, 2011 (gmt 0)

10+ Year Member



I needed to add a .* to the condition to made it work for me.
^[A-Z]{3,9}\ /.*index
because it is in a directory called demo/.

Thanks again for your explaination!

g1smd

12:09 pm on May 4, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Never use .* at the beginning or in the middle of a pattern.

If it is only ever /demo/ at this point, then replace .* with /demo/ here.

Otherwise the more general
^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
should be used.

Mariuz

12:39 pm on May 4, 2011 (gmt 0)

10+ Year Member



I see what you mean about the .*

About that THE_REQUEST testing, would it be better to not use a regex here but use exact page names? So only existing pages would match. I have about 15 pages that need to be 301 redirected.
For example:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /demo/index\.php\?page=information\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/demo/information/? [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /demo/index\.php\?page=store\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/demo/store/? [R=301,L]
et cetera

If I use a regex, like
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /demo/*index\.php\?page=([a-z]+)-([a-z]+)\ HTTP/
a non-existing page that matched, like index.php?page=yes-no would also be 301 redirected.

Any thoughts on that?

g1smd

12:55 pm on May 4, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would redirect "everything", then you don't need to keep the list up to date every time you add a new page.

If you want to list the pages, then there is no need to have multiple rules. One rule using a local OR is sufficient.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /demo/(index\.php)?\?page=(information|contact|store|help|about|privacy|thispage|thatpage|products)\ HTTP/
RewriteRule ^(index\.php)?$ http://www.example.com/demo/%1/? [R=301,L]


Note the additional ( ) ? around index.php as this allows both
example.com/index.php?something
and
example.com/?something
to be correctly redirected.

I assume this rule is located inside the example.com/demo/.htaccess and that can be dangerous in some situations (especially where the root .htaccess file contains some internal rewrite functions). Consider altering the RewriteRule pattern slightly and moving the rule to the root .htaccess file.

Mariuz

1:44 pm on May 4, 2011 (gmt 0)

10+ Year Member



You're right, this .htaccess is now in demo, but will be in the root when it's fine.

I agree, to keep up a list up to date every time I add a new page would need to much attention.
But how do I test THE_REQUEST properly and safely?

# Friendly SEO navigation
RewriteRule ^([a-z]+)/([a-z]+)$ index.php?page=$1-$2 [L]
RewriteRule ^([a-z]+)/([a-z]+)/$ index.php?page=$1-$2 [L]
RewriteRule ^([a-z]+)$ index.php?page=$1 [L]
RewriteRule ^([a-z]+)/$ index.php?page=$1 [L]

# Redirects
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /demo/index\.php\?page=([a-z]+)-([a-z]+)\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/demo/%1/%2/? [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /demo/index\.php?\?page=([a-z]+)\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/demo/%1/? [R=301,L]

I think I need 2 conditions because parameters can be: information, information-route, store, store-customers, etc.. As you can see I use a slash (/) to divide %1 and %2 to get 'information/route/' for example. All parameters are chars and lowercase. Is this matching sufficient/safe enough?

The ()? around the index.php would not work for me some how.

g1smd

2:01 pm on May 4, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your friendly navigation promotes Duplicate Content because it treats requests with or without a trailing slash all as valid. You should choose one as valid and redirect the other one to it.

The URL for a folder or the index page in a folder should end in a trailing slash. The URL for a page should NOT end in a tailing slash.

You should list the redirects before the rewrites.

The parameters to SEF redirect(s) should be first, and should force www at the same time for those requests.

The fix slashes rule should be second, and should force www at the same time for those requests.

The non-www to www canonicalisation rule should be third.

The internal rewrites should be last.

The pattern matching seems sufficient. It allows for zero or one hyphen.

Mariuz

2:53 pm on May 4, 2011 (gmt 0)

10+ Year Member



Ok, you lost me :)

>Your friendly navigation promotes Duplicate Content because it treats with or without a trailing slash as >valid. You should choose one as valid and redirect the other one to it.
# Friendly SEO navigation
RewriteRule ^([a-z]+)/([a-z]+)$ index.php?page=$1-$2 [L]
RewriteRule ^([a-z]+)$ index.php?page=$1 [L]

www.example.com/information is ok, but now www.example.com/information/ gives a 404
Do you mean that I can redirect the trailing slash one to the non-trailing slash one without promoting Duplicate Content?

>The URL for a folder or the index page in a folder should end in a trailing slash. The URL for a page should NOT end in a tailing slash.
You mean the 'information' page should not have a trailing slash, but a folder called for instance 'blah' with an index.php|index.html|index.whatever should have a trailing slash? Why is that? I saw pages with trailing slashes before.

>You should list the redirects before the rewrites.
Ok, the Friendly SEO navigation rules would come below the redirects.

>The parameters to SEF redirect(s) should be first, and should force www at the same time for those requests.
SEF = Search Engine Friendly? Do you mean these rules?
RewriteRule ^([a-z]+)/([a-z]+)/$ index.php?page=$1-$2 [L]
How can I set the parameters to go first? And how can I force www?

>The fix slashes rule should be second, and should force www at the same time for those requests.
?

>The non-www to www canonicalisation rule should be third.
?

>The internal rewrites should be last.
The rules in my '# Friendly SEO navigation' section are internal rewrites, am I right? So they com last.

g1smd

3:07 pm on May 4, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, for "pages" you should redirect "with slash" to "without slash", and serve content only when "without slash" is requested.

You may have seen some sites break the "rules", as not everyone reads the HTTP specs. Yes, folder URLs end with a slash and for those Apache will automatically look for the index file if you use the
DirectoryIndex index.php index.html
directive.

The final comments list which order your rules should be in.

The "parameters to SEF" redirect(s) should be the first rules, and should force www at the same time for those requests (by having the domain name in the rule target).

The "fix slashes" rule (that you are just about to write) should be second, and should force www at the same time for those requests (by having the domain name in the rule target).

The non-www to www canonicalisation rule should be third (so far you haven't got one of those, but you need to add one: you do not want example.com/somepage and www.example.com/somepage as Duplicate Content).

The internal rewrites should be last (those are the rules with the [L] flag, which you have commented as "Friendly SEO navigation").

Mariuz

2:18 pm on May 11, 2011 (gmt 0)

10+ Year Member



It took some time but here's what I've got now. Can you look at it again to see if this is correct?

IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /

# 1
# SEF redirects
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?page=([a-z]+)-([a-z]+)\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/%1/%2? [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?page=([a-z]+)\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/%1? [R=301,L]

# 2
# 'Fix slashes rule'
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{HTTP_HOST} !^\.localhost$ [NC]
RewriteRule ^(.+[^/])/$ [%{HTTP_HOST}...] [R=301,L]

# 3
# Set the canonical url
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

# 4
# Internal rewrites
RewriteRule ^([a-z]+)/([a-z]+)$ http://www.example.com/index.php?page=$1-$2 [L]
RewriteRule ^([a-z]+)$ http://www.example.com/index.php?page=$1 [L]
</IfModule>

g1smd

7:02 pm on May 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You don't need the <ifModule> tags. Delete them.

In the first rule, change
index\.php
to
(index\.php)?
as mentioned in my fourth post. There are four to change.

In the second rule
^(.+[^/])/$
may be more efficiently coded as
^(([^/]+/)*[^/.]+)/$
and you might not need the
 RewriteCond %{REQUEST_FILENAME} !-f 
line at all.

In the second rule hard code the actual domain name in the redirect target, not HTTP_HOST.

In the third rule change the pattern from
^example\.com$
to
!^(www\.example\.com)?$


In the fourth rule, remove
http://www.example.com
from both lines as mentioned in my first post. You need an internal rewrite here, not an external 302 redirect.

Between rules 2 and 3 you will also need:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

for index canonicalisation.

Mariuz

10:36 am on May 12, 2011 (gmt 0)

10+ Year Member



Thanks for your input g1smd, I have changed it as you stated.

There seems to be a problem with my first rule when I add the ()?. Without it redirects perfectly and with them I get a status 200

# this doesn't redirect:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(index\.php)?\?page=([a-z]+)-([a-z]+)\ HTTP/
RewriteRule ^(index\.php)?$ http://www.example.com/%1/%2? [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(index\.php)?\?page=([a-z]+)\ HTTP/
RewriteRule ^(index\.php)?$ http://www.example.com/%1? [R=301,L]

# this does redirect:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?page=([a-z]+)-([a-z]+)\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/%1/%2? [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?page=([a-z]+)\ HTTP/
RewriteRule ^index\.php?$ http://www.example.com/%1? [R=301,L]

Any clues?

I saw one other thing. When a existing folder is requested, for example:
http://www.example.com/img it returns http://www.example.com/img/?page=img and in de body: Forbidden
You don't have permission to access /demo/img/ on this server. And status-code = 301
When I add a trailing slash '/img/' it's returns a 403, which is fine.
How can I prevent that redirect?

Mariuz

9:20 am on May 13, 2011 (gmt 0)

10+ Year Member



I found a solution for that last 'problem': rewriting of a existing folder by putting the condition:
RewriteCond %{REQUEST_FILENAME} !-d
before RewriteRule ^([a-z]+)$ index.php?page=$1 [L]

I still did not found a solution fot that ()? addition, so ?page=whatever is also matched.
Do you have any ideas where it might go wrong?