Forum Moderators: phranque

Message Too Old, No Replies

merging complex rewrite rules to document root

not quite working right

         

amznVibe

9:11 am on May 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been struggling to add some rewrite rules to a site I inherited and it's been driving me crazy getting them to all behave together. Then I remembered some of the brilliant rewrite people around here like jdMorgan so maybe someone can help.

Basically it's a site with static pages in the webroot, and a blog under a subdirectory. But I noticed that rewrite rules in the subdirectory were either conflicting or not being obeyed from the document root.

Let's start with this under /blog/.htaccess which is necessary for a wordpress blog
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog

RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule . /blog/index.php [L]
</IfModule>

Now the problem is the site has old multiple domains which all need to be condensed with a 301 to a single domain. Also taking advantage of that to trim off www.

/.htaccess
RewriteCond %{HTTP_HOST}!^example\.com$
RewriteRule ^(.*)$ http://example.com/$1 [QSA,L,R=301]

Last but not least we would like to remove all instances of index.php or index.html from any URL request that doesn't need it.

/.htaccess
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(htm(l)?¦php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(htm(l)?¦php)$ [%{HTTP_HOST}...] [R=301,L]

The problem is these last two rules in the document root htaccess don't affect the blog. If I try to move the blog's rules into the document root there are other conflicts such as other .htaccess rules are not obeyed, ie.

/images/.htaccess
ErrorDocument 404 /images/404.jpg

To make it even more complicated some old urls on the blog have dates changed to where they need to find the new dates. This works some of the time depending how the rest of the above is either present or removed to moved to /blog or the /

RewriteCond %{REQUEST_URI} ^/blog/2005/03/02/old-entry-name(.*) [NC]
RewriteRule ^(.*)$ http://example.com/blog/2005/03/13/new-entry-name%1 [R=301,L,QSA]

As a huge bonus, I would like to make sure any url from the /blog/ ends with a trailing slash (/) which is optional on wordpress blogs but causes duplicate content in search engines because of that. This feature is the least of my worries compared to the above requirements.

I definitely need some expert help trying to merge all these, figure out the proper order for the rules, and try to make them behave all together. I've done quite a bit of trial and error but the problem is it's a live, active site and I really don't want to mess with the visitors too much via my mistakes.

Thanks for any assistance!

[edited by: amznVibe at 9:19 am (utc) on May 16, 2007]

amznVibe

10:04 am on May 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think I have found a trailing slash fix for virtual URL paths
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_URI}!..+$
RewriteCond %{REQUEST_URI}!/$
RewriteRule (.*) $1/ [R=301,L,QSA]

So I guess I can try to add this to the mix as well.

jdMorgan

4:25 pm on May 16, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Missed this because the thread had two posts...

Several factors may be important here:

First, if subdirectories are to be subject to higher-level directories' mod_rewrite rules, then RewriteOptions inherit must be set -- See mod_rewrite documentation.

Rule order is important: Do blanket access restrictions (i.e. block IP and user-agents) first, then do per-page (or more properly, per-URL) access restrictions, per-URL external redirects, domain (canonicalization) redirects, per-URL internal rewrites, and finally, any default (or 'catch-all') internal rewrites.

Do not mix the use of mod_alias redirects and mod_rewrite rules unless you have tested the execution order of these modules and are sure that it won't break the 'order recommendations' stated above. To be clear, each Apache module parses ("scans") your .htaccess file looking for directive that it understands, executing those and ignoring the rest -- leaving them for subsequently-invoked modules to handle. So on some servers, mod_alias Redirect directives will be executed first, while on others, the mod_rewrite directives will be executed first. This is determined by the reverse LoadModule list order on Apache 1.x, and by an internal priority scheme on Apache 2.x.

This is of particular concern if you use multiple servers --for example: development, test, and production servers-- or if you contemplate changing hosts at any time in the future.

A general comment: Don't add functions (code) to an existing problematic situation. Get what you already have working first, then add new code and re-test. "Divide and conquer" is a good approach, so don't exacerbate a problem by adding new, unknown factors.

Moving code from .htaccess to httpd.conf and/or conf.d requires changes to the code. URL-paths in .htaccess are relative to the directory in which the code resides -- In other words, the path to the current directory is stripped before RewriteRule directives in that current directory can examine the URL-path. In contrast, URL-paths in httpd.conf and conf.d must be fully specified, relative to the hostname.

So, for example:

 RewriteRule ^images/logo\.gif$ http://www.example.com/dir2/images/logo.gif [R=301,L] 

when located in /dir/.htacess, is equivalent to
 RewriteRule [b]^/dir/i[/b]mages/logo\.gif$ http://www.example.com/dir2/images/logo.gif [R=301,L] 

located in httpd.conf or conf.d

Additionally, in httpd.conf or conf.d, you can take advantage of <Directory> and <Location> containers (and others) to limit the scope of groups of rules to improve performance.

The choice of whether to use <ifModule> should be an informed one; If you use "<IfModule mod_rewrite.c> " and the server does not have mod_rewrite enabled, then the rules will be skipped. Therefore, no error messages will be generated or will be logged; The code will fail silently. Although IfModule appears in many mod_rewrite examples found on the Web, be sure that is what you want.

Jim

[edited by: jdMorgan at 4:26 pm (utc) on May 16, 2007]

amznVibe

9:19 am on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well the good news is I got them all working together except one.

Trying to make sure any virtual url (not physical file, mapped to wordpress) ends in a trailing slash.

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !..+$
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1/ [R=301,L,QSA]

Why doesn't that work?
Tried it before and after the wordpress rewrite.
Any other suggested techniques?

[edited by: amznVibe at 9:20 am (utc) on May 17, 2007]

amznVibe

10:00 am on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah it's something about the filename .ext check
RewriteCond %{REQUEST_URI}!..+$
as soon as I remove that it works.

Not sure what the original author's intent was but I think I can live without it since anything that fails the physical file test should end with with a trailing slash. I already remove index.php

jdMorgan

2:52 pm on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That line contains a regular-expressions coding error. It says "match if NOT (any character, followed by one or more of any character)." The error is that literal periods in regular-expressions patterns must be escaped, so what was likely intended is "!\..+" -- "Match a period followed by one or more characters."

If the code is intended to do what I believe it does, I'd write it like this:

# If requested URL does not contain a literal period in the final path-part or end with a slash
RewriteCond %{REQUEST_URI} !(\.[^/]+¦/)$
# and if requested URL does not exists as an actual file
RewriteCond %{REQUEST_FILENAME} !-f
# externally redirect to add a slash
RewriteRule (.*) http://www.example.com/$1/ [R=301,L]

[QSA] is not needed, as the original query string will be retained by default.

To prevent problems with conflicts between your configured ServerName and your actual preferred "canonical" domain name, always specify a full URL when doing external redirects as shown.

Always put filesystem and reverse-DNS check RewriteConds last -- No use wasting the (considerable) time and energy to perform them if the other conditions are not true.

Replace the broken pipe "¦" in the pattern above with a solid pipe before use; Posting on this forum modifies the pipe characters.

Jim