Forum Moderators: phranque

Message Too Old, No Replies

Extensionless URLS = 404 error

when testing with ScreamingFrog

         

Lorel

6:28 pm on Sep 14, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi, I just set up a site using MAMP and everything appeared to be working correctly until I checked the site in Screaming Frog. All the extensionless URLs are throwing 404s. The other URLs are ok. The canonical tags are using the extensionless urls like this:'
<link rel="canonical" href="https://example.com/privacy/">

Here is what I have in htaccess. Is this in error?

Options +Includes
Options +FollowSymLinks
RewriteEngine on
#
#redirect for NON-www
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule ^(.*)$ https://example.com/$1 [R=301,L]
#
# force HTTPS
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://example.com/$1 [R,L]
#
#External redirect for extensionless url
RewriteCond %{THE_REQUEST} https://example.com\.html
RewriteRule ^(.+)\.html$ /$1 [R=301,L]
#
# Internal rewrite for extensionless url
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) /$1.html [L]

penders

9:39 pm on Sep 14, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What happens when you request the canonical URL? What URL are you requesting?


# Internal rewrite for extensionless url
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) /$1.html [L]


If you request "https://example.com/privacy/" (with a trailing slash), assuming this doesn't map to a directory, then the above directives will rewrite this to "/privacy/.html" - which would most probably result in a 404. The above directives assume your URLs do not end in a trailing slash.


#External redirect for extensionless url
RewriteCond %{THE_REQUEST} https://example.com\.html
RewriteRule ^(.+)\.html$ /$1 [R=301,L]


Is something missing from your post? Because of the "malformed" condition, this will never actually do anything (ie. canonicalise requests that include the extension). However, if the RewriteRule did work then it would redirect to a URL without a trailing slash (which contradicts the URL stated in your canonical tag).

Lorel

11:03 pm on Sep 14, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have it working now. I just needed to replace the above line with this:

#External redirect for extensionless url
RewriteCond %{THE_REQUEST} \.html
RewriteRule ^(.+)\.html$ https://example.com/$1 [R=301,L]

Thanks for helping.

penders

12:08 am on Sep 15, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just needed to replace the above line with this


Hhhmmm, OK, but there must have been something else going on or changed? That rule should not play any part in whether Screaming Frog successfully crawls your extensionless URLs? It's the "other" rule that does all the work (and was seemingly incorrect, given your stated canonical URL)?

phranque

1:37 am on Sep 15, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



#External redirect for extensionless url
RewriteCond %{THE_REQUEST} \.html
RewriteRule ^(.+)\.html$ https://example.com/$1 [R=301,L]

given the pattern of your RewriteRule, the RewriteCond directive is redundant.

you want to put the extensionless redirect ruleset before the hostname canonicalization ruleset.
otherwise, if you request https://www.example.com/foo it will first redirect to https://example.com/foo and then redirect to https://example.com/foo.html

also you should combine the two hostname canonicalization rulesets into one.
your hostname canonicalization ruleset should always specify the 301 status code since the [R] flag defaults to a 302.

this might work:
Options +Includes
Options +FollowSymLinks

RewriteEngine on

# External redirect for extensionless url
RewriteRule ^(.+)\.html$ https://example.com/$1 [R=301,L]

# redirect for NON-www and/or force HTTPS
RewriteCond %{HTTP_HOST} !(example\.com)?$ [NC,OR]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://example.com/$1 [R=301,L]

# Internal rewrite for extensionless url
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) /$1.html [L]

lucy24

3:36 am on Sep 15, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



given the pattern of your RewriteRule, the RewriteCond directive is redundant.
No, in this situation a RewriteCond looking at THE_REQUEST is essential, because otherwise you get an infinite loop when the internal request for an URL in .html runs into the rule that strips the .html.

phranque

4:43 am on Sep 15, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



No, in this situation a RewriteCond looking at THE_REQUEST is essential, because otherwise you get an infinite loop...

you are correct - i had misread that as REQUEST_URI.

so, this instead:
Options +Includes
Options +FollowSymLinks

RewriteEngine on

# External redirect for extensionless url
RewriteCond %{THE_REQUEST} \.html$
RewriteRule ^(.+)\.html$ https://example.com/$1 [R=301,L]

# redirect for NON-www and/or force HTTPS
RewriteCond %{HTTP_HOST} !(example\.com)?$ [NC,OR]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://example.com/$1 [R=301,L]

# Internal rewrite for extensionless url
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) /$1.html [L]

phranque

5:06 am on Sep 15, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I just set up a site using MAMP and everything appeared to be working correctly until I checked the site in Screaming Frog. All the extensionless URLs are throwing 404s.

what does the server access log file say about those requests?

Lorel

4:34 pm on Sep 16, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@ penders
I forgot to mention in my earlier post that I also took the slash out of the canonical tag.

@phranque
I changed my code to match what you posted above and it appears to be working right.

Following is the last line from the access log from my clicking on one of the items in the menu (about.html) then it gets redirected to /about

162.248.151.145 - - [16/Sep/2019:10:19:39 -0600] "GET / HTTP/1.1" 301 231 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:69.0) Gecko/20100101 Firefox/69.0"

Thanks for your help.