Forum Moderators: phranque
I am at an dead end with my Apache configurations. I'd be thankfull for any advice.
My situation is:
I have a local development Apache server running, configured freely, without any serious security limitations as far as I know. I call it.. let's say 'localhost.xy' and I route to it via my 'hosts' defintion.
Then I have mod_rewrite configured for handling subdomains in server config:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^([^\./]*)\.localhost
RewriteRule (.*) /%1$1 [L]
Then on per-directory basis I use rewriter for handling pretty URLs:
RewriteBase /
RewriteRule ^([a-z]{2})$ $1/ [R,L]
RewriteRule ^(([a-z]{2})/){0,1}([^/.]+)\.html index.php?page=$3&lang=$2 [L,QSA]
RewriteRule ^(([a-z]{2})/){0,1}(.*/)([^/]+) $3$4 [L]
Brief description:
rule 1 fills up unfinished virtual language directory root eg. sub.localhost.xy/en => sub.localhost.xy/en/ to keep correct relative linking
rule 2 does the pretty thing
and rule 3 fixes all other filetypes to pass through correctly with or without language directory
And now I would like to have some Error Pages, redirecting to certain pages, for instance a pretty 404.html redirected to index.php?page=404...
So the per-directory rule would be
ErrorDocument 404 /404.html
Here is the problem. A missing file with any extension but .html redirects correctly.
but //sub.localhost.xy/UnknownPage.html that returns 404 code shows blank page (correct 404 code) without any content - no redirection, nor the implicit Apache 404 gets displayed. This happens even if I route the ErrorDocument to .php or static .htm file.
Any ideas on how to fix it? I don't understand why the rule 2 makes this problem. The page should return 404, be redirected to 404 a then rewritten again to get the content.
I don't have a clue :(.
Thanx for having a thought.
If a URL does not resolve to content (e.g. a file), then the request is internally rewritten to the filepath specified by ErrorDocument. Under normal circumstances, a URL is not used with ErrorDocument, and if you do use a URL, then this can trash your search engine rankings.
I also don't want to get too involved in your 'pretty URL code' until your main issue is resolved, although I do see some problems with it.
So let me ask, what happens if you just use
ErrorDocument 404 /index.php?page=404 Jim
And to your question. As I wrote already, no content is displayed. That is because as I process a Unknown.html as index.php?page=Unknown as soon as I find out that the page does not exist, I set 404 header and don't dump any content (if I do, I see it as content of 404 Status).
However, it is obvious, that ErrorDocument rule is not engaged for some reason. Eg. I look for and unknown file with .htm only, so it should not be processed with rewrites.
And the 404 doesn't activate, whether it is set to .txt file, .php, .htm and neither .html, which would require further rewriting. So at this point, any definition of ErrorDocument path doesn't make a difference.
If I comment the second rule out (in per-directory set), all of 404 content fetching starts working as expected (except my rewriting, of course). So there I thing, the rewriting rule (the second one, or the others, too) is faulty and somehow breaks the ErrorDocument handling.
g1, yes, could be better, but I can't see the need of domain name, as it is a per-directory rule, hmm? And following on to your thought, maybe the 303 might be even more appropriate, because it just takes care of lazy user, who omits the ending slash after language code. I will always supply a link properly ending with a slash.
Use the domain name unless you have a very good reason not to... g1smd knows a very good reason why you should use it, and so do I... If you want to find out, do some research on UseCanonicalName, and think about what might happen if that was set and your server admin made a small mistake...
> And following on to your thought, maybe the 303 might be even more appropriate, because it just takes care of lazy user, who omits the ending slash after language code. I will always supply a link properly ending with a slash.
It also takes care of "hurried Webmasters" who link to your URLs incorrectly, and thus protects your search engine rankings... mod_rewrite can have some rather important effects on search engines, as you might well realize after making a single typographical error and seeing your site disappear from search over time...
So far, you've gotten good advice from an experienced contributor, and I'm not sure why you appear to want to question it.
I'm not sure which you consider to be your 'second rule' in the discussion immediately-preceding. Your most dangerous rule is the last one --the fourth if I count correctly, and counting all of them-- because its pattern does not specify a filetype and is not end-anchored. Therefore it will match many, many URLs.
So try making that rule much more specific, or if you cannot, then (and only then, because it will be expensive performance-wise), precede the rule with a RewriteCond checking -f and or -d as required, so that the rewrite only happens if the rewritten path will resolve to a physically-existing file and/or directory.
Note that if you want to apply a rewrite to a client-requested URL-path, but not to a previously-rewritten path, you can check THE_REQUEST with a RewriteCond, to see the original client request unaffected by any internal rewrites. THE_REQUEST is the entire client request line, exactly as it appears in your raw server access log, starting with the HTTP method and ending with "HTTP/1.0" or "HTTP/1.1", all shown within quotes.
Jim