homepage Welcome to WebmasterWorld Guest from 54.196.168.78
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
RewriteRule results in 404
rubenski




msg:4566489
 10:40 pm on Apr 19, 2013 (gmt 0)

Hello,

I have been trying to fix an issue where Apache sends a 404, but I don't know why. Or actually, I think I DO know why, but I don't know how to fix it.

The actual cause of the 404 seems to be that Apache is searching the file system for a particular path, while it shouldn't do that, because the path is only virtual. It doesn't actually exist on the file system.

This is the URL that returns a 404:


http://www.example.com/thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje

This is the content of .htaccess:


RewriteEngine On

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteRule sitemap\.xml /bin/sitemap.php [nocase]
RewriteCond %{REQUEST_URI} (/|\.htm|\.php|\.html)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(/?[^/]+) /index.php [L]
Deny from all
Allow from all
Options -Indexes

php_value magic_quotes_gpc Off

Order deny, allow


The Apache error log shows this when requesting the previously mentioned URL:


[Sat Apr 20 00:26:35 2013] [error] [client 192.168.0.1] File does not exist: /sites/example.com/public/thema-s, referer: http://www.example.com/thema-s/bedrijfsuitje/


Apparently, Apache is looking for a "thema-s" directory on the file system, which it of course won't find.

I have set up rewrite logging. I think these lines are related to the problem:


192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (3) [perdir /sites/example.com/public/] add path info postfix: /sites/example.com/public/thema-s -> /sites/example.com/public/thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje
192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (3) [perdir /sites/example.com/public/] strip per-dir prefix: /sites/example.com/public/thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje -> thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje
192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (3) [perdir /sites/example.com/public/] applying pattern '(.*)' to uri 'thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje'
192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (3) [perdir /sites/example.com/public/] add path info postfix: /sites/example.com/public/thema-s -> /sites/example.com/public/thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje
192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (3) [perdir /sites/example.com/public/] strip per-dir prefix: /sites/example.com/public/thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje -> thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje
192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (3) [perdir /sites/example.com/public/] applying pattern 'sitemap\.xml' to uri 'thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje'
192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (3) [perdir /sites/example.com/public/] add path info postfix: /sites/example.com/public/thema-s -> /sites/example.com/public/thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje
192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (3) [perdir /sites/example.com/public/] strip per-dir prefix: /sites/example.com/public/thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje -> thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje
192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (3) [perdir /sites/example.com/public/] applying pattern '^(/?[^/]+)' to uri 'thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje'
192.168.0.1 - - [20/Apr/2013:00:26:35 +0200] [www.example.com/sid#b8f85f40][rid#b915bec0/initial] (1) [perdir /sites/example.com/public/] pass through /sites/example.com/public/thema-s


I think the last line means Apache has decided that "/sites/example.com/public/thema-s" is the result of the rewriting process and it goes looking for this uri on the file system.

 

Dideved




msg:4566494
 11:34 pm on Apr 19, 2013 (gmt 0)

You should increase the log level. I think trace8 is as high as it goes. Currently, your logs don't show the rewrite conditions.

In fact, I suspect that the problem is in one of the conditions. This one in particular:

RewriteCond %{REQUEST_URI} (/|\.htm|\.php|\.html)$

I suspect that this condition is supposed to be negated. Like so:

RewriteCond %{REQUEST_URI} !(/|\.htm|\.php|\.html)$

The idea is, you're assuming that any request that ends in .html, .php, etc., is not supposed to be rewritten to your index.php. If that assumption is true, then you can gain a teensy bit of performance by skipping the "is a file" check.

Dideved




msg:4566497
 11:45 pm on Apr 19, 2013 (gmt 0)

Actually, there's a second problem also:

RewriteRule ^(/?[^/]+) /index.php [L]

Your matching on [^/] which means any non-slash character, so it won't capture the whole request path. Probably that rewrite rule should be:

RewriteRule (.*) index.php [L]

lucy24




msg:4566514
 2:04 am on Apr 20, 2013 (gmt 0)

Short version: There is nothing in your existing htaccess that allows for extensionless URLs. So mod_rewrite is doing exactly what it's been told to do.


Long version:

I have set up rewrite logging.


Ooh, you're on 2.4, I'm envious. Pull apart the log and you can see that for each separate RewriteRule, the server does two preliminary things: first it adds
/bedrijfsuitje/schadelijk-bedrijfsuitje
(uhh... I don't understand this, but never mind)
and then it sets aside
/sites/example.com/public/
leaving only the actual request. Then it tests the request against each rule in order-- about which more in a minute, heh heh.

applying pattern '(.*)' to uri
The pattern obviously fits, so it stops to evaluate the Condition. Apparently the Condition is not met, so it continues on to the next RewriteRule.

applying pattern 'sitemap\.xml' to uri
The pattern doesn't fit the request, so the server skips to the next rule without bothering to check for Conditions. (There don't happen to be any; they would only have been evaluated if the Rule itself potentially fit.)

applying pattern '^(/?[^/]+)' to uri
Since the pattern has no closing anchor, it will match all requests except for the root. So the server stops to evaluate the Conditions. The first condition asks whether the request ends in / or html? or php -- in other words, it's a request for a page. Since your request doesn't meet this Condition, the server stops cold and looks for the next Rule. There isn't one, so the request emerges from mod_rewrite unchanged.

Gosh, that was fun. I've never seen a rewrite_log before. Now, er, what was the question again?

Apache is searching the file system for a particular path, while it shouldn't do that, because the path is only virtual

But the server has no way to know that. The request passed through mod_rewrite, failed to trigger any rules, and emerged unchanged. So let's have a closer look at that htaccess.

RewriteEngine On

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteRule sitemap\.xml /bin/sitemap.php [nocase]
RewriteCond %{REQUEST_URI} (/|\.htm|\.php|\.html)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(/?[^/]+) /index.php [L]
Deny from all
Allow from all
Options -Indexes

php_value magic_quotes_gpc Off

Order deny, allow

Ouch, ouch, ouch. I do believe every single line is in the wrong place.

:: shuffling papers ::

Options -Indexes

Order deny, allow
Deny from all
Allow from all


This group doesn't even make sense, but never mind that.

php_value magic_quotes_gpc Off

RewriteEngine On

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

I initially thought this rule was in the wrong place. But since it's the only one that generates a redirect rather than a rewrite-alone, it is in the correct place.

RewriteRule sitemap\.xml /bin/sitemap.php [nocase]

RewriteCond %{REQUEST_URI} (/|\.htm|\.php|\.html)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(/?[^/]+) /index.php [L]

Ugh, ugh, I smell a CMS. For starters, never put something in a condition that can go in the body of the Rule. Are you in htaccess or a config file? If config, the request begins with a directory slash. If htaccess, it doesn't. The RewriteLog looks like htaccess.

At a minimum, dump the first Condition and replace it with a rule whose pattern says something like

RewriteRule ^(([^/]+/)*([^/.]+\.(html?|php))?)$ et cetera

and you only need that much if your site uses all three extensions for pages: .php, .htm and .html. This rule will, of course, come AFTER all redirects-- not only the domain-name one which you've already got, but also the "index.html" redirect. That one, again, will be expressed as "index\.(html?|php)".

phranque




msg:4566557
 8:46 am on Apr 20, 2013 (gmt 0)

I have set up rewrite logging.


Ooh, you're on 2.4, I'm envious.


if you are on 2.2 you can use the RewriteLog Directive:
http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html#rewritelog

rubenski




msg:4566605
 1:26 pm on Apr 20, 2013 (gmt 0)

Divided, Lucy, thanks for your suggestions and explanations. I now understand that indeed Apache is just doing its job and decides none of the patterns match, so it doens't rewrite anyhting and starts looking for the path on the file system. I have solved the issue by adding a new rule, which acts as a 'catch all' for anything that was not matched by a previous rule.

I had tried Divided's suggestion by changing the ^(/?[^/]+) part to (.*), but that (strangely, I thought it should match anything) doesn't solve the problem.


RewriteEngine On

RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteRule sitemap\.xml /bin/sitemap.php [nocase]
RewriteCond %{REQUEST_URI} (/|\.htm|\.php|\.html)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(/?[^/]+) /index.php [L]

# start of new rule
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*) /index.php [L] # matches anything not previously matched
# end of new rule

Deny from all
Allow from all
Options -Indexes

php_value magic_quotes_gpc Off

ErrorDocument 404 /404.php


I also noticed that the initial version of my htaccess (first post) DOES match the URL http://www.example.com/thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje if I stick a / to the end. I am not so good at regexes, but does this sound logical to you? The logging tells me it is actually rewriting the URL when I use the trailing slash:


192.168.0.1 - - [20/Apr/2013:15:34:05 +0200] [www.example.com/sid#b8b66f40][rid#b8d58338/initial] (3) [perdir /sites/example.com/public/] applying pattern '^(/?[^/]+)' to uri 'thema-s/bedrijfsuitje/origineel-bedrijfsuitje/'
192.168.0.1 - - [20/Apr/2013:15:34:05 +0200] [www.example.com/sid#b8b66f40][rid#b8d58338/initial] (2) [perdir /sites/example.com/public/] rewrite 'thema-s/bedrijfsuitje/origineel-bedrijfsuitje/' -> '/index.php'
192.168.0.1 - - [20/Apr/2013:15:34:05 +0200] [www.example.com/sid#b8b66f40][rid#b8d58338/initial] (1) [perdir /sites/example.com/public/] internal redirect with /index.php [INTERNAL REDIRECT]


I may be able to apply a better fix than I did now by changing ^(/?[^/]+) to not require a trailing slash (which it apparently does right now)


Should anyone be interested: I put the following in the vhost config to enable rewrite logging (don't forget to restart apache after making these changes):

RewriteLog /sites/example.com/logs/rewrite.log
RewriteLogLevel 3

3 is the 'normal' log level. 8 and 9 provide more info I am told.

lucy24




msg:4566663
 7:28 pm on Apr 20, 2013 (gmt 0)

if you are on 2.2 you can use the RewriteLog Directive:

Context: server config, virtual host

I knew there had to be a catch :(

I can try it on MAMP for my own edification, but not on the live site.

I also noticed that the initial version of my htaccess (first post) DOES match the URL http://www.example.com/thema-s/bedrijfsuitje/schadelijk-bedrijfsuitje if I stick a / to the end. I am not so good at regexes, but does this sound logical to you?

Yes, perfectly logical. The rule says "ends in directory slash"; you feed in a request for filename ending in directory slash; rule executes.

The problem is... In real life, an URL that ends in a slash is a directory. An URL that doesn't is a page. In your case it is a little bit academic since none of the pages physically exist. But you need to pick a form and stick with it. If all your URLs end in a slash, then you're pretending that each page is its own directory. If all your URLs don't end in a slash, then you need to tweak your code to allow for extensionless URLs.

Now, personally I don't approve of extensionless URLs, but this is purely an individual preference. Nothing to do with either Apache or SEO. Going extensionless is definitely easier to code for, because then all page names can be expressed as

^([^.]*)$

If it has an extension, it's a supporting file-- image, css etc --and the rules can bypass it. You can run the -d test if you like, but you can skip -f because you already know there is no file with a name in the form "abcefg" and that's all.

This is assuming for the sake of discussion that none of your directory names contain literal periods. A period in an URL is not illegal, of course, but they're an amazingly bad idea. So unless your name is apache dot org, don't use them. Stick with alphanumerics. (People can fight about hyphens and lowlines, but that's for a different forum.)

Finally: This is a virtual-host setup, so you can do config-file things. But you've also mentioned htaccess. When you're first setting up the site, it can be very useful to have htaccess files-- that is, AllowOverrides is enabled. You can change things on the fly without having to restart the server, because htaccess is instant.

But once you've got everything stabilized, see if you can turn off the AllowOverrides directives-- or at least most of them-- and shift all the htaccess rules to <Directory> sections within the config file. This way everything runs faster because each request only makes one stop: the config file.

Dideved




msg:4566681
 8:29 pm on Apr 20, 2013 (gmt 0)

> A period in an URL is not illegal, of course, but they're an amazingly
> bad idea.

erm... what?! There's absolutely nothing wrong with a period in the URL.

phranque




msg:4566741
 12:15 am on Apr 21, 2013 (gmt 0)

I knew there had to be a catch


same as 2.4:

http://httpd.apache.org/docs/current/mod/core.html#loglevel
LogLevel Directive
...
Context: server config, virtual host, directory



A period in an URL is not illegal, of course, but they're an amazingly bad idea.


periods are specified as separators for hostnames and IP address:
RFC 1738 - Uniform Resource Locators (URL):
http://www.ietf.org/rfc/rfc1738.txt [ietf.org]

it is also customary to use periods as word separators in file and directory names, so periods in the uri path should be expected.

the only problem with a period is its ambiguity in regular expressions, requiring a backslash escape.
that's not the fault of the URL.
=8)

lucy24




msg:4566776
 4:13 am on Apr 21, 2013 (gmt 0)

the only problem with a period is its ambiguity in regular expressions

Periods also have the same problem as lowlines: they tend to become invisible in links. And, unlike lowlines-- but in common with hyphens-- they are not encompassed under \w

Now, if only there were a law prohibiting the use of the letters g and q in links on Canadian sites, where they always get misread as each other...

Dideved




msg:4566835
 6:25 pm on Apr 21, 2013 (gmt 0)

> Periods also have the same problem as lowlines: they tend to become
> invisible in links.

Well, first, periods are _not_ invisible in links. Notice phranque's post just above. There's a link with dots, and they're not invisible. Second, if we want our regexs to be bug free, then we can't assume people won't use dots, same as how we can't assume people won't use lowlines. And third, if you did start including dots, then your pattern would actually be shorter and simpler.

[edited by: Dideved at 6:45 pm (utc) on Apr 21, 2013]

phranque




msg:4568484
 1:22 pm on Apr 27, 2013 (gmt 0)

And, unlike lowlines-- but in common with hyphens-- they (periods) are not encompassed under \w


that is because \w matches any "word" character and anything else is by definition a word separator.
spaces, periods and hyphens are probably the most commonly used word separators, so i don't understand why you see that as a problem.

lucy24




msg:4568579
 9:23 pm on Apr 27, 2013 (gmt 0)

If each file or directory name consists entirely of \w then constructing Regular Expressions is dead easy. Permit hyphens, and right away everything has to change from \w to [\w-]. Permit additional punctuation and soon you are into [^blahblah] territory instead.

Literal periods are particularly troublesome because they do have meaning as punctuation: separating the various components of the hostname at one end of the URL; separating filename from extension at the other end. So you're on a three-way toggle between "means A", "means B", and "means nothing".

Anyway, the Great Divide is that a lowline is \w while all other punctuation is \W

Dideved




msg:4568589
 11:18 pm on Apr 27, 2013 (gmt 0)

You say "dead easy" as if the alternatives were not. But that's not the case.

If the goal is to match a path segment, then
[^/]+. That's dead easy. If the goal is to match *any* path and *any* filename, as was the case that started this discussion, then .*, which is also dead easy.

This is a lot simpler than you're making it out to be. There's no need for arbitrary rules on punctuation.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved