Forum Moderators: phranque

Message Too Old, No Replies

How to keep present URLs after comeback to static HTML website

         

deeper

7:50 am on Jul 12, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi,
several years ago I changed my static HTML sites to a CMS with database, Wordpress. With this change I also changed the URLs from http://www.example.de/page.html to http://www.example.de/page, i.e., I dropped ".html" with the help of Wordpress URL settings and a 301 in htaccess.

This is still the present status. My URLs are like http://www.example.de/page (without .html).
After some troubles with WP, theme ect. and because my very small informational company sites do not really need a database and CMS, I have now decided to return to a selfcoded static html/CSS site.

The issue now: URLs should not change again. How can I keep the present URLs like http://www.example.de/page?
I will have to code now HTML-files with an editor. So these HTML-files will have .html as file-endings again and I will have to upload them to the webserver, right? So I finally would have again the origin .html-files instead of my favored present URLs like http://www.example.de/page. And the homepage will have index.html.

How can I change back to static HTML sites and keep the present URLs?

1. Using HTML-files on the webserver with deleted .html-endings
I just tried it and deleted the file-ending ".html" of a page on the server. The webserver still accepts the file as a file of an unknown type and browser still show the page correctly.
I guess this is not a clean way (what about index.html?), but in theory it seems to work.

2. The still present 301 will do the job and this is enough?
May be I will have to do nothing and everything is perfectly fine? Why? Because the 301, which is still active at the moment, will do its job also for the new HTML-files on the server, telling both search engines and websurfers "this pages have URLs like http://www.example.de/page"?

The 301 would have to stay then for lifetime...hm.

This is the code in my htaccess at the moment:
#301-redirect: page.html zu page
RewriteRule ^([\w-]+)\.html$ http://www.example.de/$1 [R=301,L]


3. Any new solution with htaccess
There is any really proper solution for this case by using any htaccess code?
Any ideas?

Thanks,

deeper

topr8

10:56 am on Jul 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



the 301 is to redirect any calls to example.com/yadda.htm to the extensionless version ... you can leave this in place, however this is really to catch any links which point to the filename with the extension.

you then need to add a rewrite such as
RewriteRule ^([a-zA-Z0-9-_.]*)$ /$1.html [L]

which will internally serve the file with the extension when the extensionless file is requested.

not2easy

11:53 am on Jul 12, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



How can I keep the present URLs like http://www.example.de/page?
Each page in WP has several versions so it is not one for one unless you want more work than you need to have. Another point - Your example is not https: but if you are not moving to https: you should probably not bother with rewriting or redirecting because most browsers are or will be scaring visitors away.

In a situation where you are migrating from WP to static html, the URL taxonomy is quite different. To redirect from the old URLs to the new static pages you will need to do more than have the same appearance. The WP URLs offer the same content at multiple URLs so if your new static URLs are not going to have all those extra URLs, you should be sure that you have set up a way to capture all the old formats and rewrite them to your preferred URL. Hopefully that URL has had a canonical meta tag to let Google know which version of the "blue widgets" pages you wanted to have indexed. You would need to capture whatever /category/ or /tag/ and /archive/ URLs that have existed to rewrite those to their new replacement URLs.

For example, if you set up your WP URLs to have a Permalink syntax that omitted category, tag, archive and other optional additions (such as /product/ ?) and you set a canonical meta tag to link to your preferred URL, then that would be the choice to use for your new URLs. That way, old to new would go smoothly (or have a better chance at least). If you have not been using any canonical meta link for your content, then it is anyone's guess which version of your content has been selected for indexing. Look at either your log files or analytics to see where visitors are landing. Google does not always follow the canonical preference if others have been linking to your content using a different URL. Start with your sitemap to help you have a list of all your posts and/or pages so you will know what needs to be changed. See what you can get out of GSC for linked content to help you plan. Then you can start by setting up rules to send requests for https://www.example.de/blue-widgets/ and https://www.example.de/books/blue-widgets/ and https://www.example.de/calendars/blue-widgets/ all to https://www.example.de/blue-widgets/ - it is a downsizing.

Your rules want to capture old requests for all category names, tag names and the format of your archives and send all of the versions to one page. This is not redirecting all pages to the home page for visitors to start over so you need a map, a plan before you begin. When you know all of the terms that need to be removed or altered, then you can set up rules. One rule is not likely to handle all cases, though some can be combined.

The .html extension to the filename can be changed with a rule, don't just delete the .html from the file or browsers might have problems that could slow down loading.

deeper

12:12 pm on Jul 12, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you.

the 301 is to redirect any calls to example.com/yadda.htm to the extensionless version ... you can leave this in place, however this is really to catch any links which point to the filename with the extension.


Yes, this was and still is the purpose of it.




you then need to add a rewrite such as
RewriteRule ^([a-zA-Z0-9-_.]*)$ /$1.html [L]

which will internally serve the file with the extension when the extensionless file is requested.


Not sure if I misunderstand you or may be it is due my lacking englisch, but "serve with extensions" sounds like "providing URLs with extensions".

Does this code take my future .html-files on the server and create extensionless URLs (for both search engines and web requests)? That's the task.

not2easy

12:29 pm on Jul 12, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The
RewriteRule ^([a-zA-Z0-9-_.]*)$ /$1.html [L] 
rule will add .html to requests without .html and it is a 302 (temporary) rule without a
[L,301]
flag at the end.

I don't think this is where you want to go, one rule would remove the .html and the other would add it back on.

deeper

12:47 pm on Jul 12, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



@not2easy:
Thanks for your comprehensive answer.

In a situation where you are migrating from WP to static html, the URL taxonomy is quite different. To redirect from the old URLs to the new static pages you will need to do more than have the same appearance. The WP URLs offer the same content at multiple URLs so if your new static URLs are not going to have all those extra URLs, you should be sure that you have set up a way to capture all the old formats and rewrite them to your preferred URL. Hopefully that URL has had a canonical meta tag to let Google know which version of the "blue widgets" pages you wanted to have indexed. You would need to capture whatever /category/ or /tag/ and /archive/ URLs that have existed to rewrite those to their new replacement URLs.

For example, if you set up your WP URLs to have a Permalink syntax that omitted category, tag, archive and other optional additions (such as /product/ ?) and you set a canonical meta tag to link to your preferred URL, then that would be the choice to use for your new URLs. That way, old to new would go smoothly (or have a better chance at least). If you have not been using any canonical meta link for your content, then it is anyone's guess which version of your content has been selected for indexing. Look at either your log files or analytics to see where visitors are landing. Google does not always follow the canonical preference if others have been linking to your content using a different URL. Start with your sitemap to help you have a list of all your posts and/or pages so you will know what needs to be changed. See what you can get out of GSC for linked content to help you plan. Then you can start by setting up rules to send requests for https://www.example.de/blue-widgets/ and https://www.example.de/books/blue-widgets/ and https://www.example.de/calendars/blue-widgets/ all to https://www.example.de/blue-widgets/ - it is a downsizing.


There is a canonical meta tag created by Wordpress itsself and all other signs, which search engines may consider (sitemap, internal and external links) indicate the same URLs following this simple pattern: http://www.example.de/page1 ...page2 .....page3 ect.
All pages and URLs follow this pattern and there are no folder-URLs.

Your rules want to capture old requests for all category names, tag names and the format of your archives and send all of the versions to one page. This is not redirecting all pages to the home page for visitors to start over so you need a map, a plan before you begin. When you know all of the terms that need to be removed or altered, then you can set up rules. One rule is not likely to handle all cases, though some can be combined.


My sites are very small and have a simple URL-structure. They are not blogs with posts, but small company sites with only few pages. There are no archives, categories, tags ect.

There are four sites. They have 16, 20, 38 and 49 pages. Each page of each site has one canonical URL and they all follow the above mentioned pattern: http://www.example.de/page1 (no folders or subfolders).
The only "special" issue is the mentioned 301 redirect: The origin .html-URLs are partly alive in the web and therefore the mentioned 301 redirects "page.html" --> "page" . This is mainly for search engines in order to consider old .html links and also to lead clicks on them to the right page.


You mentioned the https-issue. Yes, this should be done, along with a hoster change and the mentioned change back to static html.
Is it a good idea to combine the return from WP to static sites with http --> https, doing both in one go?

not2easy

1:06 pm on Jul 12, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Either change can disrupt traffic for a time as your "new" URLs are crawled. If the old URLs will remain the same by simply removing the file extension then I would add that in at the same time for minimal disruptions. The rule for the file extensions should appear in your .htaccess file before the https: rules. I am not positive that you want to use any [L] (Last) flag on that .html remover rule because it will need further changes for https.

If you search here (upper right search for desktop) for "to https:" you will find literally several hundreds (my guess is thousands) of discussions. The more recent are more convenient, the older discussions have more 'meat'.

topr8

1:11 pm on Jul 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^([a-zA-Z0-9-_.]*)$ /$1.html [L]
>>I don't think this is where you want to go, one rule would remove the .html and the other would add it back on.

no it doesn't, it's an internal rewrite, when extensionless is requested, the file with extension is served. or at least that's how it works on my server.

i thought the OP was creating static html files, but wanted to keep the extensionless URIs

deeper

2:59 pm on Jul 12, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



@topr8:
RewriteRule ^([a-zA-Z0-9-_.]*)$ /$1.html [L]
>>I don't think this is where you want to go, one rule would remove the .html and the other would add it back on.

no it doesn't, it's an internal rewrite, when extensionless is requested, the file with extension is served. or at least that's how it works on my server.

i thought the OP was creating static html files, but wanted to keep the extensionless URIs


At the moment I have WP sites, with extensionless URLs.
Before WP, several years ago, I had static sites with extension.
In near future I want return to static sites but keep the present extensionless URLs, though I will be forced to upload newly created .html-files.
--> How will I be able to keep the present extensionless URLs of WP after uploading .html-files of newly created static sites on the server?
Regarding URLs, noone should notice any change with my future "again" static sites, neither search engines nor web visitors.

I fear I cannot explain it better, is it still confusing?

Imagine in two months I will upload the new .html-files and delete WP (or better move in another folder).
People will request my pages with extensionless URLs but due to the new files they won't get them. Except the present 301 would handle this situation, but obviously it doesn't.
Search engines would not find extensionless URLs any more and set them to 404.


I guess your code does the opposite of what my goal is, because it adds

deeper

4:22 pm on Jul 12, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



@not2easy:
Either change can disrupt traffic for a time as your "new" URLs are crawled. If the old URLs will remain the same by simply removing the file extension then I would add that in at the same time for minimal disruptions. The rule for the file extensions should appear in your .htaccess file before the https: rules. I am not positive that you want to use any [L] (Last) flag on that .html remover rule because it will need further changes for https.


Ah o.k., both together could mean less confusion than two steps.
So you suggest two separate htaccess rules and the https rule as second? I guess it's not possible or advisable to combine both in one rule?

Regarding L-Flag and further changes to https: What do you mean?


If you search here (upper right search for desktop) for "to https:" you will find literally several hundreds (my guess is thousands) of discussions. The more recent are more convenient, the older discussions have more 'meat'.


You address mixed content and similar challenges?

There are some further things to consider - as always :) - but at the moment I'm busy with URL creation on the webserver.

not2easy

4:47 pm on Jul 12, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The [L] flag tells the server to stop processing those URLs, so you want the rules to continue until they get to the https rules which should be the last rules.

I cannot say that that second rule does what you want simply because its target appears to gather the request and adds ".html" to it. There is a possibility that the rule works on someone's site to remove .html from the URLs, since I don't see what other rules are there; but since you already have a rule for that, why add two?

In your first post you said that your rule was working as expected to remove .html from incoming requests so I did not consider a reason for a second rule prior to sending everything to https. There are far more capable folks around here than I, it might help you to work slowly until one of them has time. Weekends often take longer to get certainty. ;)

lucy24

4:52 pm on Jul 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do your URLpaths ever contain a . (dot, period, full stop) other than immediately before the extension? If not, the internal rewrite can be expressed much more simply as

^([^.]*)$
>>
$1.html

There is no rule about using . anywhere in an URL--Apache, for example, does it regularly--but many things are easier if you make sure never to use it.

The .html-to-extensionless redirect shouldn't be needed--though it will do no harm--unless you goofed at some point and you’re actually getting requests with .html at the end.

not2easy

6:34 pm on Jul 12, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The old pages are extensionless WP, the new pages will be static .html files. Other than that, the filenames and URLs would be the same. The idea was to rewrite the .html files to be the same as the old URLs on WP. The OP is currently using:
#301-redirect: page.html zu page
RewriteRule ^([\w-]+)\.html$ http://www.example.de/$1 [R=301,L]
and says it works fine. - but now will be changing to https URLs.

lucy24

9:11 pm on Jul 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Incidentally: complete brain fart on my part because I forgot it has to be
^[^.]+[^/]$
so you're not rewriting directory requests (including root). Oops.

but now will be changing to https URLs
But that's just a matter of adding an "s" in the target, isn't it? (This thread has been somewhat meandering, so I may have missed a key piece.)

tangor

5:43 am on Jul 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



First: What is the need to go from CMS to static?

Two: Is extension negative required (other than a dim hope of keeping what you got?)

Three: SOMETIMES a clean start is the best thing in the world.

That said, you can find a number of methods to "go back" to static. .htaccess if your best friend in that regard. If you are moving from HTTP to HTTPS that's a minor complication.

But I do have to ask, why change? What does WP not offer that makes the transition back to static necessary? A WP site, kept clean and with the least amount of plugins (for security purposes) is actually a pretty fair platform for a website. (Me, never went there, but that's beside the point.)

Is there a compelling need to ditch WP?

topr8

10:47 am on Jul 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




I fear I cannot explain it better, is it still confusing?
I guess your code does the opposite of what my goal is, because it adds


i understood it perfectly the first time. ... it doesn't add, or at least not on my server it doesn't

what i suggested was an in internal rewrite eg the uri stays as you want eg. extensionless, but in actual fact your .html file is served to the client (who sees it as an extensionless file)

thanks for your reply though.

deeper

12:50 pm on Jul 13, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



@tangor:
First: What is the need to go from CMS to static?

Two: Is extension negative required (other than a dim hope of keeping what you got?)

Three: SOMETIMES a clean start is the best thing in the world.

That said, you can find a number of methods to "go back" to static. .htaccess if your best friend in that regard. If you are moving from HTTP to HTTPS that's a minor complication.

But I do have to ask, why change? What does WP not offer that makes the transition back to static necessary? A WP site, kept clean and with the least amount of plugins (for security purposes) is actually a pretty fair platform for a website. (Me, never went there, but that's beside the point.)

Is there a compelling need to ditch WP?


The better question would be: Why did I move from static to WP years ago? As I said they are "static" informational company sites, no blogs, they are very small (between 16 und 50 pages), they have all the same layout and only one webmaster, doing really everything.

I'am a natural healer and counselor, my knowledge about HTML, CSS, SEO has been gathered over many years, but I really have no professional IT background and I'm only busy with these things, if there is a concrete reason, i.e., from time to time, like a seasonal worker. On the other hand I like to have full control over my sites. These two facts make it important for me to keep things simple and understandable. For example I always have to write down new insights, because otherwise I would forget it and would have to start from new in a few months.

I really look forward to my future sites, which are created completely by myself. No WP or theme or Plugins, which are "always" modified and updated and I don't can control what they do. Then there is additional tech stuff, for example PHP and database and the hoster and they too are changing and I always have to run after it. And all these things can have flaws too. The more complicated, the more work, confusion and mistakes.

No WP, theme, Plugins, PHP and database any more. Just my HTML and CSS code. And even with CSS I can update my sites with one action by using includes, so I can have a little bit of a CMS even with simple CSS3 and HTML5. Both provide good tools which were not present years ago, when I moved to WP (CSS grid).
Btw, my WP sites have been hacked, though I'm very conscious with security, have only two secure plugins and having done everything you can do in order to secure WP. It was a recent ly dicovered security flaw in my theme and no security plugin or any other measurment could have been able to avoid the disaster.
Bing still has indexed thousands of fake URLs by the trojan and Bing WMT do nor provide a good way to delete from from index. But this is another story.

You really would advise me to change back to .html-URLs?
It's always a good idea to NOT change URLs, only if there are very good reasons, which I don't see in my case.
Especially returning to URLs I already had in the past when starting with my sites (with extension) is a bad idea and confusing search engines.
page.html --> page --> page.html? Search engines have proceeded the change from page.html --> page and now changing back and saying them "hey, the new URLs are the first one I already had several years ago"?
.html is not a problem for itsself, but in my case... and in general it is easier and shorter without it.

deeper

1:14 pm on Jul 13, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi @lucy24, alway nice to meet you from time to time

Incidentally: complete brain fart on my part because I forgot it has to be
^[^.]+[^/]$
so you're not rewriting directory requests (including root). Oops.


Could you please explain, what exactly this code does, regarding web requests and search engines wanting "entensionless URLs" like now?
What is the difference compared with the code of topr8, who says he covers my issue, understanding my wish properly.

but now will be changing to https URLs

But that's just a matter of adding an "s" in the target, isn't it? (This thread has been somewhat meandering, so I may have missed a key piece.)


Yes, this is a "add-on", which may be proper to combine with my origin issue of "keeping extensionless URLs in spite of html-files".
not2easy mentioned it and I remembered to have https on my to-do-list too.

lucy24

7:13 pm on Jul 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In summary: When using extensionless URLs, you need to do two things: Redirect requests from with-extension URLs to the without-extension form, and Rewrite requests for extensionless-anything to the physical file which has an extension.

If your URLs have never had an extension, you shouldn’t need the redirect. This, in turn, means you don’t need a RewriteCond looking at THE_REQUEST so you don’t go around in circles. (Also the [NS] flag to cover internal requests for directory-index files, assuming those also end in .html.)

The rewrite I had in mind was
RewriteRule ^([^.]+[^/])$ /$1.html [L]

But wait! A further nasty complication is that certain robots (looking especially at you, AppleBot, but there are others) are so fixated on extensionless URLs, they will routinely request
/dir/subdir
for real, physical directories whose URL has never been anything but
/dir/subdir/
In Apache, these are generally handled by mod_dir, which first adds the slash and then supplies the appropriate DirectoryIndex. But if you’ve got a rewrite to handle extensionless URLs, you also need to ensure that nothing is getting rewritten to the nonexistent
/dir/subdir.html
and preferably you want to do this without a server-intensive !-d RewriteCond. (The form -f is never needed, because real physical files will always have an extension.) Alternatives depend on the exact site; even a RewriteCond excluding a short list of named directories would be more efficient.

phranque

12:24 am on Jul 14, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The
RewriteRule ^([a-zA-Z0-9-_.]*)$ /$1.html [L] 
rule will add .html to requests without .html and it is a 302 (temporary) rule without a
[L,301]
flag at the end.

you can only get a 302 (or 301) redirect by using the [R] flag.

while i haven't tried [L,301] it should throw an error.

the [L] flag without an [R] flag simply means an internal rewrite.

lucy24

2:05 am on Jul 14, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



while i haven't tried [L,301] it should throw an error.
Yup: Test site confirms that it’s a solid 500--on any request, not just one that matches the pattern. Supplementary for-the-heck-of-it experimenting confirms that this will happen if you put anything in "flag" position that isn't a recognized flag, like some random letter of the alphabet.

the [L] flag without an [R] flag simply means an internal rewrite
Or no action at all, depending on whether there’s a target or merely a - filler. (I know that you know that; I’m throwing it in for archival purposes.)

deeper

10:50 am on Jul 14, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



In summary: When using extensionless URLs, you need to do two things: Redirect requests from with-extension URLs to the without-extension form, and Rewrite requests for extensionless-anything to the physical file which has an extension.
...If your URLs have never had an extension, you shouldn’t need the redirect. This, in turn, means you don’t need a RewriteCond looking at THE_REQUEST so you don’t go around in circles.


O.K., I guess I understand this.

1. Redirect to without-extension
My URLs had .html-extensions before I changed from static to WP. Therefore I still have this 301 redirect in my htaccess:
#301-redirect: page.html zu page
RewriteRule ^([\w-]+)\.html$ http://www.example.de/$1 [R=301,L]


So I have to keep this code exactly as it is now, without any change? As a separate code, without combining it with new rewrite code?

2. Rewrite of requests for extensionless-anything to the physical file which has an extension
On the server there will arrive
- normal extensionless requests and
- requests with extension (from old links in the web), which were redirected to without-extension by 301 (see 1.)

Both now must get the new actual .html-files on the server, because they have the right content.
Due to the htaccess-code they will
- get the new .html-files and
- will appear both in the browser of visitors and for search engines as extensionless URLs, like now.
Right?

The homepage will have index.html as usual. It will be handled also correctly?





But wait! A further nasty complication is that certain robots (looking especially at you, AppleBot, but there are others) are so fixated on extensionless URLs, they will routinely request
/dir/subdir
for real, physical directories whose URL has never been anything but
/dir/subdir/
In Apache, these are generally handled by mod_dir, which first adds the slash and then supplies the appropriate DirectoryIndex. But if you’ve got a rewrite to handle extensionless URLs, you also need to ensure that nothing is getting rewritten to the nonexistent
/dir/subdir.html
and preferably you want to do this without a server-intensive !-d RewriteCond. (The form -f is never needed, because real physical files will always have an extension.) Alternatives depend on the exact site; even a RewriteCond excluding a short list of named directories would be more efficient.


So this complication is not covered by your code above at the moment? If yes, do you need any further information?

not2easy

4:50 pm on Jul 14, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I believe that only testing can tell you what would happen if/when some robot might confuse the issue by requesting files that don't exist. You can try that yourself to see that a 404 error document is returned.

As for whether to use your existing rule without change, I would suggest that you combine it to manage all non-https and .html requests with one ruleset, such as:
#Redirect non https: and non www requests
#301-redirect: page.html zu page
RewriteCond %{HTTPS} !on [OR]
RewriteCond %{SERVER_PORT} 80 [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.de)?$
RewriteRule ^([\w-]+)\.html$ https://www.example.de/$1 [R=301,L]


But others my have more efficient ideas. Please await some input from those who may have additions or suggestions. That rule for port 80 is in case incoming old links are still using "http" syntax.

Your index.html file can be used "as is" but to avoid folks linking to it and confusing robots, you should have a separate rule for that single page - and it should be before the ruleset I've posted. I would need to go look up my rules for that, but this is what I have for now.

lucy24

6:41 pm on Jul 14, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you’re redirecting an URL whose path is inherently wrong, whether that's Old URL to New URL or with-extension to without-extension, you don’t need to check things like protocol and hostname, since those will all be corrected in the target anyway.

Location of rules:
The external redirect (flag [R=301,L]) should be located towards the end of your redirects, probably shortly before the canonicalization redirect which is always last. I’d say, put it immediately after the index.html redirect to avoid ambiguity.
The internal rewrite (flag [L] alone) should be one of the very last RewriteRules, definitely after all R=301 redirects. Its optimal location will depend on what, if any, other [L] rules you have.

But, again, the issue of requests for
/dir/subdir
and the like is best handled on a site-by-site basis. Sometimes it’s easiest to take over half of mod_dir's job and put them in manually; for example one of my sites has
RewriteRule ^ebooks/(\w+)$ https://example.com/ebooks/$1/ [R=301,L]
to prevent chained redirects that come in with the wrong protocol. This is site-specific: /ebooks/ happens to contain more subdirectories than the rest of the site put together, and all URLs in the directory are in the form /ebooks/title/ (or /ebooks/title/volume.html which isn’t affected by the rule).

deeper

7:13 pm on Jul 15, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



@lucy24:
Regarding the /dir/subdir-issue I don't have any subfolders or subdirectories at the moment and this won't change. All URLs follow the simple pattern like http(s)://www.example.de/page1 ...page2 ...page3 ect.
There will only be a folder with images and one for css. That's all.
Does this help finding an efficient code which fixes the /dir/subdir-issue?

Regarding the location of rules my htaccess looks as follows (WP related codes of course will be dropped with the future static sites). So obviously the only L-rule will be the 301-redirect:

# xmlrpc.php deaktivieren
<Files "xmlrpc.php">
Require all denied
</Files>

# Kein Zugriff auf wp-config und Dateien mit WP-Version
<FilesMatch "(wp-config.php|liesmich.html|readme.html|liesmich.txt|readme.txt|licence.txt)">
Require all denied
</FilesMatch>

# Zusaetzlicher htaccess-PW-Schutz
<Files wp-login.php>
AuthType Basic
...
</files>

# 301-redirect: page.html zu page
RewriteRule ^([\w-]+)\.html$ http://www.example.de/$1 [R=301,L]

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

# BEGIN Hotlinking unterbinden
<IfModule mod_rewrite.c>
#RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?example\.de(/.*)?$ [NC]
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?google\.[^/]+(/.*)?$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^(.*)Googlebot(.*)$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^Googlebot\-Image(.*)$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^Googlebot\-Video(.*)$ [NC]
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?bing\.[^/]+(/.*)?$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^(.*)Bingbot(.*)$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^(.*)MSNBot-Media(.*)$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^(.*)BingPreview(.*)$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^(.*)MSNBot(.*)$ [NC]
...
RewriteRule \.(jpe?g|png|gif|svg|pdf|mp3)$ - [NC,F]
</IfModule>
# END Hotlinking unterbinden

# Serverseitige deflate-Komprimierung
<IfModule mod_filter.c>
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/plain text/html text/xml text/css text/javascript text/rtf
AddOutputFilterByType DEFLATE application/javascript application/x-javascript application/msword application/ld+json
</IfModule>
</IfModule>

# Browser-Caching durch mod_expires
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType text/html "access plus 1 month"
....
</IfModule>

deeper

7:26 pm on Jul 15, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



@not2easy:
#Redirect non https: and non www requests
#301-redirect: page.html zu page
RewriteCond %{HTTPS} !on [OR]
RewriteCond %{SERVER_PORT} 80 [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.de)?$
RewriteRule ^([\w-]+)\.html$ https://www.example.de/$1 [R=301,L]

That rule for port 80 is in case incoming old links are still using "http" syntax.


So, this code would work, but does not consider the /dir/subdir-issue and index.html speciality?
Old links always have "http". The very old ones with extension, the younger ones without extension.

not2easy

8:23 pm on Jul 15, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I understand that those WP related rules will be going away, but that WP snippet should always be, or should always have been at the end of all other rules while WP is active. Talking about this code:

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

Those assorted rules should be higher up in the file, the hot -linking and such should not be after the canonical rewrites and rules with the [L] flag. It can cause looping or 500 errors to have things in the wrong order.

That ruleset I posted does not consider "/dir/subdir" URLs, that is correct. It was based on your earlier statement that none existed. Lucy had some suggestions regarding those rules I had posted but she is looking at different syntax than you have described. So it depends - do those various default URLs exist or not? If your site is still using WP, I would test to verify that the other WP default URLs do not exist before ignoring the /dir/subdir format. That is how WP is made to work by default. Sorry, I can't give a simple Yes/No here.

lucy24

9:37 pm on Jul 15, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't have any subfolders or subdirectories at the moment and this won't change
If you have no subdirectories in your URLpaths--and no directories at all except ones containing with-extension supporting files--then you need not worry about that aspect. (Whew.) Yes, I realize it makes the whole thing now sound like a long and pointless digression--but it’s something that could become calamitous on sites where it does occur, so it’s good to get everything spelled out and unambiguous.

But wait! Since I myself don’t use extensionless URLs, I don’t know whether robots also like to take the opposite approach, requesting
/filename/
when the URL is correctly
/filename
and-that's-all. Check your logs periodically; it will then be your choice if you want to redirect these erroneous requests, or let them get the 404 they deserve.

deeper

1:51 am on Jul 17, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



@not2easy:
Those assorted rules should be higher up in the file, the hot -linking and such should not be after the canonical rewrites and rules with the [L] flag. It can cause looping or 500 errors to have things in the wrong order.


You are talking from the present WP sites with this htaccess, not from the future static ones without WP, right?
I remember having asked the hoster support each time before adding a code like browser caching or deflate or hotlinking. So the order of rules should be o.k., but I will check again.


So it depends - do those various default URLs exist or not?


If you have no subdirectories in your URLpaths--and no directories at all except ones containing with-extension supporting files--then you need not worry about that aspect. (Whew.) Yes, I realize it makes the whole thing now sound like a long and pointless digression--but it’s something that could become calamitous on sites where it does occur, so it’s good to get everything spelled out and unambiguous.


I agree.

URLpaths really don't contain subdirectories. With only 20 oder 50 pages there is no need for it and I like to keep things simple. My very old and still static sites (I have two) as well as the "again" static future ones will have this simple structure of the website main folder:

- one image folder with image files and may be some other media files in it (.jpg, .gif, .png, .mp3)
(- future videos will be hosted by Youtube)
- one CSS folder with one big CSS file in it (stylesheet.css)
- HTML files, following all the pattern of http(s)://www.example.de/page1 ...page2 ...page3
- some few .pdf, .doc
- robots.txt
- htaccess and htpasswd
- favicon.ico

So let me summarize the whole thing.

1. It is possible to solve my concern "keeping extensionless URLs in spite of return to HTML-files with extensions" by htaccess code on a server with Apache
2. This code basically needs to cover two things
a) a 301 redirect of requests from with-extension URLs to without-extension
b) a rewrite of requests for extensionless URLs (originally extensionless oder redirected by 301) to the physical file which has an extension
3. The 301 redirect which is already active currently can stay active as it is at the moment
4. It's advisable but not really necessary to cover the index.html with an extra code
5. The /dir/subdir subdirectory issue should be adressed basically, but in my case there is no need for it, because I don't (will) have any subdirectories.
6. There are finally three pieces of code, 301-redirect to extensionless, rewrite to extension and index.html and the order of them all matters.
7. Order also matters regarding all L-rules, but I have only one.
8. Regardless the fact, that the order matters, all three pieces of code could be summarized in one "chunk" of code, like the example of not2easy.
9. The code obviously probably is no performance killer (just my own thought)

Correct so far? If yes, anybody wants to give the final code?


Since I myself don’t use extensionless URLs, I don’t know whether robots also like to take the opposite approach, requesting
/filename/
when the URL is correctly
/filename
and-that's-all.


May be any bode else knows...