Forum Moderators: phranque

Message Too Old, No Replies

Need a plain english rewrite translation

         

wwarren

1:39 am on May 29, 2009 (gmt 0)

10+ Year Member



I'm trying to resolve a problem where I'm getting a "Request exceeded the limit of 10 internal redirects due to probable configuration error" and my incomplete understanding of the rewrite rules and directives is getting in the way. I'm hopeful somebody can offer a translation.

My wordpress blog is in my /var/www/wordpress folder. The links are "pretty" permalinks of the form www.example.com/wordpress/yyyymm/post_title.html which means that Wordpress made a .htaccess file that says this:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /wordpress/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /wordpress/index.php [L]
</IfModule>
# END WordPress

Everything works fine in that configuration. However when I add a virtualhost to apache to set the documentroot to /var/www/wordpress so I can have a URL that looks like "www.example.com/yyyymm/post_title.html" without needing the "/wordpress" path, I get the redirects error.

My interpretation of the .htaccess statement is:
The first RewriteCond is TRUE if the URL does not exist or is not a file. This would be TRUE in my case.
The second RewriteCond is TRUE if the URL does not exist or is not a directory. Again this would be TRUE in my case.

First, is my translation correct, and second, what does the rewriterule do? The single "." has me confused, and why would it always load the same file, regardless of the requested filename?

The debug of the error looks like:
[date] [debug] core.c(3046): [client IP] r->uri = /wordpress/index.php
[date] [debug] core.c(3052): [client IP] redirected from r->uri = /wordpress/index.php
Repeat 10 times.

Suggestions for fixing are welcome.

jdMorgan

3:17 am on May 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My interpretation of the .htaccess statement is:
The first RewriteCond is TRUE if the URL does not exist or is not a file. This would be TRUE in my case.T
he second RewriteCond is TRUE if the URL does not exist or is not a directory. Again this would be TRUE in my case.

I'm not sure where that 'magic' OR is coming from...

The first RewriteCond is true if the requested URL-path does not resolve to an existing file.
The second RewriteCond is true if the requested URL-path does not resolve to an existing directory.

While both *were* true in the initial configuration, neither will be true in the new configuration, because both of these directives are testing the *filepath* derived from the requested URL.

If you "take /wordpress out of the URL," then the tested filepaths will be incorrect, and no files or directories may be found to exist where they were expected to be, and therefore, the rule will be invoked unexpectedly, and --as shown by your log file-- will even rewrite /wordpress/index.php to itself -- and recursively.

Since I'm not looking at your entire server configuration and I don't know where you've placed this .htaccess file, I'll suggest trying these changes initially, by way of demonstrating at least part of the solution.

In /wordpress/.htaccess:


# BEGIN WordPress
RewriteEngine on
RewriteBase /
RewriteCond %{DOCUMENT_ROOT}/wordpress%{REQUEST_URI} !-f
RewriteCond %{DOCUMENT_ROOT}/wordpress%{REQUEST_URI} !-d
RewriteRule . /wordpress/index.php [L]
# END WordPress

First, we get rid of the useless <IfModule> container, whose purpose is to guarantee a silent failure if mod_rewrite isn't installed. If you're Wordpress Support, you might want that, but if you're a Webmaster, it's doubtful that you don't want to know if mod_rewrite goes missing...

We also set the RewriteBase back to "/" to simplify matters.

Then, we "construct" the correct test filepaths by 'injecting' /wordpress in between the DocumentRoot and the requested URI, since it won't be present in the URL, but it must be present in the filepaths we actually want to check.

One thing to bear in mind that's quite helpful is that a URL (or a URI) is not a filepath, or vice-versa. They are not equivalent, nor even "sort of the same thing." They are entirely-different things which are only "associated" by the action of a server; In fact, the fundamental job of a server is to translate Web URLs to server operating-system filesystem filepaths. Once you've worked extensively with mod_rewrite, all of this becomes obvious and intuitive, for the simple reason that you can use mod_rewrite to 'map' almost any URL-path to almost any filepath, but it is a leading point of confusion at first. And if you don't understand that URL-paths and filepaths are not at all the same thing, then spotting the problem in the code above is fairly impossible...

The "." in the RewriteRule pattern means "match any single character." Since this pattern is un-anchored, it means "match any requested URL-path that contains any (at least one) single character anywhere within it." So, the pattern effectively means "match any non-blank requested URL-path."

Since the only blank URL-path would be a request for "/", and since "/" is likely already mapped to index.php by the action of mod_dir, the RewriteRule essentially means, "Rewrite all requested URL-paths to '/wordpress/index.php' -- qualified by the preceding RewriteConds only."

The resources cited in our Forum Charter may prove useful to you if you want to dig into this some more.

Jim

wwarren

3:57 am on May 29, 2009 (gmt 0)

10+ Year Member



Thanks for that very detailed, and helpful explanation. If your exact suggestion doesn't work, I think you've at least set me on the path to figure it out.

The "or" in my translation attempt was an incorrect paraphrase of "Treats the TestString as a pathname and tests if it exists AND is a regular file." from the apache site.

g1smd

9:08 am on May 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Rewriting all URL-paths to the Wordpress index.php file would normally also see requests for /robots.txt being rewritten too (that's a bad thing as your script cannot return anything useful for that). So, the "exisits as file/folder" checks stop that happening. Those checks are inefficient, and so I prefer to select by URL pattern instead. This is easy for extensionless URLs (for example a recent project sees the URL for pages generally being either a four or seven digit number - the pattern only matching those two conditions, and being a lot more efficient).

jdMorgan

1:35 pm on May 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In this case, since we're putting WordPress into a subdirectory with its own .htaccess, part of that problem is already taken care of. However, it might still be useful to exclude image and media files from the 'exists' checks if performance is a concern.

But first get it working, then add tweaks.

Jim

wwarren

1:14 am on May 30, 2009 (gmt 0)

10+ Year Member



jdMorgan,
No luck with the suggested solution. The error log shows the exact same problem, but even the home page won't load. I've tried a few variations but haven't hit on the right thing yet. You were correct in assuming the .htaccess file is in /var/www/wordpress.

So a couple of questions. If virtualhost says for www.example.com that documentroot="/var/www/wordpress" and a request comes in for "www.example.com", how does %{DOCUMENT_ROOT}/wordpress%{REQUEST_URI} translate? What is it if the request is for www.example.com/yyyymm/post_title.html? And just to be clear, "yyyymm/post_title.html" is dynamically created by wordpress. There's no such physical directory or file under /var/www/wordpress.

If I'm reading the definitions of the variables correctly, it seems to me that maybe your suggested RewriteCond is giving me something like /var/www/wordpress/wordpress/post_title.html, which would never be a valid path.

In case it helps, here's the content of /etc/apache2/httpd.conf.


<Directory /var/www/wordpress>
Options FollowSymLinks
AllowOverride FileInfo
</Directory>

<VirtualHost *:80>
DocumentRoot /var/www
ServerName www.example.com
</VirtualHost>

<VirtualHost *:80>
DocumentRoot /var/www/wordpress
ServerName blog.example.com
</VirtualHost>

[edited by: jdMorgan at 3:12 am (utc) on May 30, 2009]
[edit reason] example.com [/edit]

jdMorgan

3:12 am on May 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



documentroot="/var/www/wordpress" and a request comes in for "www.example.com", how does %{DOCUMENT_ROOT}/wordpress%{REQUEST_URI} translate?

It translates to /var/www/wordpress/
And this is subsequently mapped to the index.php file in the /wordpress directory by mod_dir.

What is it if the request is for www.example.com/yyyymm/post_title.html?

That translates to
/var/www/wordpress/yyyymm/post_title.html

> And just to be clear, "yyyymm/post_title.html" is dynamically created by wordpress. There's no such physical directory or file under /var/www/wordpress.

And since it does not exist as a physical file and it does not exist as a physical directory, it gets rewritten to /var/www/wordpres/index.php by the rewriterule, and wordpress examines REQ_REC or PATH_INFO to get the "filename" /yyyymm/post_title.html and do its normal thing.

This is how it should be working, anyway. I suggest you look closely at your error logs --Apache and wordpress-- to identify the specific fault.

[added] Since you're using /wordpress as document_root, you may need to add the RewriteBase /wordpress bit back in. [/added]

Jim

[edited by: jdMorgan at 3:15 am (utc) on May 30, 2009]