Forum Moderators: phranque

Message Too Old, No Replies

404 error doc messed up

it's repeating domain name several times

         

Lorel

9:05 pm on Sep 4, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just noticed that the custom error document is causing a repeat of the domain name every time there is a typo in the page URL and also adding an extra "m" on the end of htm

i.e., https://www.example.com/www.example.com/www.example.com/www.example.com/www.example.com/missing.htmm

and every time I test it the repeat gets longer, i.e., currently at 16 repeats.

here is the 404 line:

ErrorDocument 404 https:www.example.com/missing.htm 
AddHandler server-parsed .htm


The last major change to the site was setting up https, however, I verified the changes with whynopadlock. The 404 missing page was working fine before that.

I assume this is some kind of loop so I have included the following:

Here is the index to root redirect:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ https://www.example.com/$1 [R=301,L]


and here is the code to force https and also canonical and www redirect.

RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.example.com/$1 [R,L]


can someone see anything wrong?

[edited by: phranque at 9:43 pm (utc) on Sep 4, 2018]
[edit reason] fix quote codes [/edit]

phranque

9:40 pm on Sep 4, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



try this:
ErrorDocument 404 /missing.htm


edited after being reminded by whitespace's following post

[edited by: phranque at 9:45 pm (utc) on Sep 4, 2018]

whitespace

9:43 pm on Sep 4, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



ErrorDocument 404 https:www.example.com/missing.htm


If that is intended to be an absolute URL then you are missing a double slash after the scheme. However, you shouldn't be using an absolute URL here anyway, since that will trigger a 302 to the error document, not the desired 404. You should use a root-relative path. For example:

ErrorDocument 404 /missing.htm


On an unrelated note, your "force https and also canonical and www redirect" rule block is also incorrect. The first condition will always be true (since you are checking for "example.com" anywhere in the hostname) and it only redirects when requesting on port 80, so it won't canonicalise a request for "https://example.com" (HTTPS and no www)

This should instead be something like:


RewriteCond %{HTTP_HOST} ^example\.com [NC,OR]
RewriteCond %{SERVER_PORT} 80
RewriteRule (.*) https://www.example.com/$1 [R,L]


This is also a 302 (temporary) redirect - change it to a 301 when you are done testing.

phranque

9:48 pm on Sep 4, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.example.com/$1 [R,L]

i would suggest this instead:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC,OR]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]

not2easy

11:42 pm on Sep 4, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Apache defaults to a 302 response (temporarily moved) so without the [R=301 part of the flag, it is not considered a 301 (permanent) change.
I just thought that an explanation could be helpful.

Lorel

7:06 pm on Sep 5, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I removed the full url on missing file and that is working ok now.

I tried the code that Phranque suggested for the canonical but it produced an error and wouldn't load the site.

I put the old code back in and the site is loading correctly.

I then added the [R=301,L] on the end and it's still ok.

Is there anything else I should change? It currently reads:

<quote>
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]
</quote>

phranque

9:51 pm on Sep 5, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I tried the code that Phranque suggested for the canonical but it produced an error and wouldn't load the site.

what was the corresponding message in the server error log file?

Lorel

11:51 pm on Sep 5, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@phranque sorry I don't remember. I had put the old code back in and didn't want to re do what you sent. Something about can't load this page.

phranque

12:42 am on Sep 6, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



your error log file disappeared?

whitespace

1:46 am on Sep 6, 2018 (gmt 0)

10+ Year Member Top Contributors Of The Month



phranque: i would suggest this instead:


RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC,OR]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301,L]



This should be OK, providing you don't use any other subdomains (or domains) that resolve to the same place. Since it redirects everything to "www.example.com", it doesn't simply canonicalise "example.com". This is not a problem with your original rule (or my suggestion above ;).

However, the first RewriteCond directive is a bit confusing/ambiguous-looking and should be simplified IMO. The first condition uses a negated pattern that is also entirely optional. (THEORETICAL BIT START...) If it's optional then it (potentially) matches an empty host, but the pattern is negated, so it's successful when the host is non-empty. For any legitimate request the host is always non-empty, so it's always successful, which would result in a redirect loop. (...THEORETICAL BIT END) However, it doesn't actually work like that, it will always try to make a positive match, so it will always match the hostname, rather than a non-empty hostname, so there is no redirect loop as it happens. However, neither will this match an empty host header (which I don't think is the intention). That "theoretical bit" was really just to highlight the "confusing/ambiguous-looking" nature of that expression. (TBH, this looks like a case of having incorrectly applied elements you often see used in a "positive expression" to a "negated expression".)

Since a negated condition is being used, you should really just be checking that the host is not "www.example.com" (as written, all lowercase). End of. So, this should be written as:


RewriteCond %{HTTP_HOST} !^www\.example\.com$ [OR]


And this is now successful when the host header is empty (HTTP 1.0 request). This contrasts with the opposite / positive expression, where you would make it optional and case-insensitive:


RewriteCond %{HTTP_HOST} ^(example\.com)?$ [NC,OR]


In this "positive expression", the host is either "example.com" (any case) or is empty... then redirect to "www.example.com".

phranque

2:31 am on Sep 6, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This should be OK, providing you don't use any other subdomains (or domains) that resolve to the same place. Since it redirects everything to "www.example.com", it doesn't simply canonicalise "example.com". This is not a problem with your original rule (or my suggestion above).

i would have expected other hostnames in the configuration would be described in the problem statement as that would be an unusual condition.
the usual exceptions to this are the typical www subdomain and possibly wildcard subdomain configurations, both of which are properly handled by this:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC,OR]

However, the first RewriteCond directive is a bit confusing/ambiguous-looking and should be simplified IMO. The first condition uses a negated pattern that is also entirely optional.

this means if the Host HTTP Request header isn't either blank or exactly the canonical hostname, case insensitive.
you will find this suggested code snippet described in hundreds of threads in this forum btw.
with some hosting services the Host header value will always be lower cased, making the [NC] flag unnecessary but it won't hurt.
if you are on shared hosting the Host header will likely be valuated for HTTP 1.0 requests that don't send a Host HTTP Request header, so that makes the "optionalizing" of the pattern also unnecessary but again it won't hurt.

lucy24

3:51 am on Sep 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For any legitimate request the host is always non-empty, so it's always successful
whitespace, look again. The negation means “if the host is NOT (exactly suchandsuch OR exactly nothing)”. That’s why the opening and closing anchors are essential.

I should point out that if you are on shared hosting, or in any situation using a <VirtualHost> envelope, the “or nothing” option is almost certainly superfluous, simply because requests without a Host: header will never reach your site in the first place. So you could shave three bytes from your htaccess with no ill effects. The [NC] flag is a whole nother matter for fruitful discussion...

Lorel

4:34 pm on Sep 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@phranque sorry I missread your statement - didn't see "log file". I just downloaded the log file but don't have a log analyzer program to help me read it. I understand most of what is written there but have no idea which one to look for now (2 days later).

This site does not have any other domains or subdomains. All urls are in lower case. It is on shared hosting.

I'm really confused by the discussion above and have no idea which version to use now.

I tried this (as first line)
RewriteCond %{HTTP_HOST} ^(example\.com)?$ [NC]

and typed in url in the location bar without the www and it won't bring up the www version of the site.

If I use this:
RewriteCond %{HTTP_HOST} example\.com [NC]

It doesn't revert to www either.

nor this:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]

Also, it doesn't bring up the www version if I remove everything in location but the domain name.

Can someone help me rewrite this line?