Welcome to WebmasterWorld Guest from 3.209.80.87

Forum Moderators: Ocean10000 & phranque

Extensionless URLS = 404 error

when testing with ScreamingFrog

     
6:28 pm on Sep 14, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:June 4, 2002
posts: 1916
votes: 3


Hi, I just set up a site using MAMP and everything appeared to be working correctly until I checked the site in Screaming Frog. All the extensionless URLs are throwing 404s. The other URLs are ok. The canonical tags are using the extensionless urls like this:'
<link rel="canonical" href="https://example.com/privacy/">

Here is what I have in htaccess. Is this in error?

Options +Includes
Options +FollowSymLinks
RewriteEngine on
#
#redirect for NON-www
RewriteCond %{HTTP_HOST} ^www\.example\.com [NC]
RewriteRule ^(.*)$ https://example.com/$1 [R=301,L]
#
# force HTTPS
RewriteCond %{HTTP_HOST} example\.com [NC]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://example.com/$1 [R,L]
#
#External redirect for extensionless url
RewriteCond %{THE_REQUEST} https://example.com\.html
RewriteRule ^(.+)\.html$ /$1 [R=301,L]
#
# Internal rewrite for extensionless url
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) /$1.html [L]
9:39 pm on Sept 14, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3148
votes: 4


What happens when you request the canonical URL? What URL are you requesting?


# Internal rewrite for extensionless url
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) /$1.html [L]


If you request "https://example.com/privacy/" (with a trailing slash), assuming this doesn't map to a directory, then the above directives will rewrite this to "/privacy/.html" - which would most probably result in a 404. The above directives assume your URLs do not end in a trailing slash.


#External redirect for extensionless url
RewriteCond %{THE_REQUEST} https://example.com\.html
RewriteRule ^(.+)\.html$ /$1 [R=301,L]


Is something missing from your post? Because of the "malformed" condition, this will never actually do anything (ie. canonicalise requests that include the extension). However, if the RewriteRule did work then it would redirect to a URL without a trailing slash (which contradicts the URL stated in your canonical tag).
11:03 pm on Sept 14, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:June 4, 2002
posts: 1916
votes: 3


I have it working now. I just needed to replace the above line with this:

#External redirect for extensionless url
RewriteCond %{THE_REQUEST} \.html
RewriteRule ^(.+)\.html$ https://example.com/$1 [R=301,L]

Thanks for helping.
12:08 am on Sept 15, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3148
votes: 4


I just needed to replace the above line with this


Hhhmmm, OK, but there must have been something else going on or changed? That rule should not play any part in whether Screaming Frog successfully crawls your extensionless URLs? It's the "other" rule that does all the work (and was seemingly incorrect, given your stated canonical URL)?
1:37 am on Sept 15, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11842
votes: 242


#External redirect for extensionless url
RewriteCond %{THE_REQUEST} \.html
RewriteRule ^(.+)\.html$ https://example.com/$1 [R=301,L]

given the pattern of your RewriteRule, the RewriteCond directive is redundant.

you want to put the extensionless redirect ruleset before the hostname canonicalization ruleset.
otherwise, if you request https://www.example.com/foo it will first redirect to https://example.com/foo and then redirect to https://example.com/foo.html

also you should combine the two hostname canonicalization rulesets into one.
your hostname canonicalization ruleset should always specify the 301 status code since the [R] flag defaults to a 302.

this might work:
Options +Includes
Options +FollowSymLinks

RewriteEngine on

# External redirect for extensionless url
RewriteRule ^(.+)\.html$ https://example.com/$1 [R=301,L]

# redirect for NON-www and/or force HTTPS
RewriteCond %{HTTP_HOST} !(example\.com)?$ [NC,OR]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://example.com/$1 [R=301,L]

# Internal rewrite for extensionless url
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) /$1.html [L]
3:36 am on Sept 15, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15874
votes: 872


given the pattern of your RewriteRule, the RewriteCond directive is redundant.
No, in this situation a RewriteCond looking at THE_REQUEST is essential, because otherwise you get an infinite loop when the internal request for an URL in .html runs into the rule that strips the .html.
4:43 am on Sept 15, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11842
votes: 242


No, in this situation a RewriteCond looking at THE_REQUEST is essential, because otherwise you get an infinite loop...

you are correct - i had misread that as REQUEST_URI.

so, this instead:
Options +Includes
Options +FollowSymLinks

RewriteEngine on

# External redirect for extensionless url
RewriteCond %{THE_REQUEST} \.html$
RewriteRule ^(.+)\.html$ https://example.com/$1 [R=301,L]

# redirect for NON-www and/or force HTTPS
RewriteCond %{HTTP_HOST} !(example\.com)?$ [NC,OR]
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://example.com/$1 [R=301,L]

# Internal rewrite for extensionless url
RewriteCond %{REQUEST_URI} !(\.[^./]+)$
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) /$1.html [L]
5:06 am on Sept 15, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11842
votes: 242


I just set up a site using MAMP and everything appeared to be working correctly until I checked the site in Screaming Frog. All the extensionless URLs are throwing 404s.

what does the server access log file say about those requests?
4:34 pm on Sept 16, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:June 4, 2002
posts: 1916
votes: 3


@ penders
I forgot to mention in my earlier post that I also took the slash out of the canonical tag.

@phranque
I changed my code to match what you posted above and it appears to be working right.

Following is the last line from the access log from my clicking on one of the items in the menu (about.html) then it gets redirected to /about

162.248.151.145 - - [16/Sep/2019:10:19:39 -0600] "GET / HTTP/1.1" 301 231 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:69.0) Gecko/20100101 Firefox/69.0"

Thanks for your help.