Forum Moderators: phranque

Message Too Old, No Replies

Rewrite to change page1.html to page.html

Applies to any and every folder

         

NickMNS

8:12 pm on Nov 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As the title states I need to rewrite redirect [R=301] every page that was page1.html to simple page.html. The problem is that instance of page1.html occur in almost every folder, thus the rewrite applies to and and every folder.

Example urls are:
https://exmaple.com/A1B/A1B2C3/page1.html ==> https://exmaple.com/A1B/A1B2C3/page.html
https://exmaple.com/A1B/page1.html ==> https://exmaple.com/A1B/page.html
https://exmaple.com/B9R/page1.html ==> https://exmaple.com/B9R/page.html
...


My attempt:
rewriteRule ^[A-Za-z0-9\/\-]+\/page1\.html$ page.html


The problem is that this applies to all the folders but rewrites them to a single location.
https://exmaple.com/page.html


What I don't get is how to preserver the first portion of the url and include it into the result. I know there is something $1, $2. But I am not understanding how this works.

lucy24

9:07 pm on Nov 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You forgot to capture. I'd suggest
^((\w+/)+)oldpage.html
>>
$1newpage.html

rewriteRule ^[A-Za-z0-9\/\-]+\/page1\.html$ page.html
Yikes. The target of a RewriteRule needs to contain the full protocol-plus-hostname if it is an external redirect with R flag. (If it's an internal rewrite, start it with / for root.) So the target will be
https://example.com/$1newpage.html
(The lack of a / after the capture, before the literal text, gives me the fantods--but here it is correct because the directory slash is part of the capture.)

Can your directory names contain hyphens? If so, the pattern needs to include them: [\w-]+ instead of bare \w+. You don't need to escape a hyphen when it's the very last thing inside grouping brackets. And in mod_rewrite you never need to escape / slashes. (Also not in mod_setenvif, the other likeliest place for Regular Expressions in Apache. There are one or two obscure places where you do need to escape slashes because they're used as RegEx delimiters, just like in javascript.)

Is there always at least one directory? If no, replace + with *.

Stop here. Do not scroll down until you have read and assimilated everything above.
.
.
.
.
.
.
.
.

RewriteRule ^(([\w-]+/)+)oldpage\.html https://example.com/$1newpage.html [R=301,L]

NickMNS

9:38 pm on Nov 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Holy, cryptic chaos Batman!
This works

RewriteRule ^(([\w-]+/)+)page1.html$ $1page.html [R=301,L]


Now for my personal knowledge and education:
I still don't get the logic behind $N operator. Is it capturing what is in the outer brackets? Thus if I wanted a $2 I could have a second set of brackets. Example:

RewriteRule ^(([\w-]+/)+)(page)1.html$ $1$2.html [R=301,L]


But when I test this it doesn't work? It repeats the value of the last folder.

https://www.example.com/A1B/A1B/2C3/page1.html
//becomes:
https://www.example.com/A1B/A1B/2C3/2C3/.html

phranque

9:40 pm on Nov 21, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The group number equals the number of preceding left parenthesis.

NickMNS

9:49 pm on Nov 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, I forgot to add,
This will be an internal redirect, it serve simply to correct an error in the file name, naming convention.

Is there always at least one directory? If no, replace + with *.

Yes always at least 1.

[\w] vs [a-zA-Z0-9]

I am used to using regex in Javascript and Python, not with Apache, so I wasn't sure if the \w was valid as the examples in the Apache docs tend to show [A-Za-z], likely because digits are not needed in the examples. So thanks for clarifying that as well as the escaping of the slashes. If REGEX wasn't cryptic enough, there has to be different conventions for different programming languages on top of it.

NickMNS

9:54 pm on Nov 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The group number equals the number of preceding left parenthesis.

Huh! (as in interesting)

So, switching $2 to $3 should work...
RewriteRule ^(([\w-]+/)+)(page)1.html$ $1[b]$3[/b].html [R=301,L]


testing...
...
It does!

Wow! I learnt something...

Thank you!

lucy24

11:03 pm on Nov 21, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This will be an internal redirect
What does “internal redirect” mean?

I try to use double markedness to make sure it’s unambiguous. Note that internal vs. external has nothing to do with whether it's the same domain or a different one; it’s about whether it happens inside the server, or visibly on the outside so the visitor (or at least the human visitor's browser) knows something is happening.

“External redirect” = send back a 300-class response telling the browser to make a fresh request at this new URL, which might be either at the same hostname or a different one. Target starts in http(s); mod_rewrite flag typically [R=301,L]

“Internal rewrite” = secretly, within the privacy of the server, without telling the visitor you’re doing so, pull up content from this other location (ordinarily, but not obligatorily, somewhere on the same host). Target starts in / only; mod_rewrite flag typically [L] alone.

You have now also learned that formatting such as [ b ] doesn’t work inside [ code ] tags ;)

NickMNS

4:45 pm on Nov 22, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...I took a little detour to test this out properly. In my previous posts I had used an online testing tool to validate the code let's just say it is not the most reliable website.

What does “internal redirect” mean?

The redirect is an external redirect, in that the url in the user's browser should be update with the url of the final location.

I spent some time reading up on all this, mostly in Apache2 docs. To be clear my intention is to implement this in the VirtualHost context and not within an .htaccess file. As such it is not recommended to use mod_rewrite, instead I am using RedirectMatch.

The final code that I tested on an actual working website is:
RedirectMatch 301 "^/(([\w-]+/)+)page1\.html$" "https://www.example.com/$1page.html"



You have now also learned that formatting such as [ b ] doesn’t work inside [ code ] tags ;)

Not until you pointed it out!

phranque

10:08 pm on Nov 24, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I spent some time reading up on all this, mostly in Apache2 docs. To be clear my intention is to implement this in the VirtualHost context and not within an .htaccess file. As such it is not recommended to use mod_rewrite, instead I am using RedirectMatch.

if you are using mod_rewrite directives anywhere in your .htaccess or server configuration files, you should use mod_rewrite everywhere, avoiding any mod_alias directives for redirect purposes.
this will help avoid chained redirects or exposing internal urls.

The use of RewriteRule to perform this task may be appropriate if there are other RewriteRule directives in the same scope. This is because, when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first, regardless of the order of appearance in the configuration file.

source: [httpd.apache.org...]

NickMNS

2:34 am on Nov 25, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@phranque
Thanks for tips, I had read that and have taken it into consideration. But in this case there is no other use of mod_rewrite, of .htaccess. It keeps things simple.

phranque

2:54 am on Nov 25, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



there is no other use of mod_rewrite

how are you doing hostname canonicalization?
(e.g., when someone requests non-www and/or http:, you can't detect that with mod_alias)

NickMNS

2:07 pm on Nov 25, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



how are you doing hostname canonicalization?

The simple answer , as it applies to this very specific case, is I'm not doing any hostname canonicalization. This specific case pertains to a test server that is not in production.

That said, once this does eventually move into production hostname canonicalization will be required. But, on my other servers this is done at the same time as the http to https: that is in VirtualHost *:80 there are two redirects (www & no-www) using %{HOST_NAME} condition and then a redirect permanent to https:www.

I intend on placing the page1 to page redirect into the VirtualHost *:443 context. I assume that this will avoid any of the issues raised in your post.

phranque

6:46 pm on Nov 25, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



there are two redirects (www & no-www) using %{HOST_NAME} condition and then a redirect permanent to https:www

that is not likely to work using mod_alias.
can you show the directives you used?

lucy24

10:08 pm on Nov 25, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you use both mod_alias and mod_rewrite, there will sometimes be two redirects. You can’t prevent it, unless you play fast and loose with your server configuration to make mod_alias run first--which I really, really do not recommend.

The question, of course, is how many of those chained redirects you’ll be getting in real life. In some cases, the visitor is intentionally requesting the wrong URL, so you might argue that any chained redirects are ultimately their fault. (I will have more to say about this in a week or so when I post some of what I’ve learned from my final https move.) And many of the requests with wrong protocol and/or hostname will end up as 403s anyway. But it's still a bit of extra work for your server.

The other question is why Apache keeps telling us to use mod_alias for redirects even though there is no possible configuration that doesn’t also entail the use of mod_rewrite. Besides, doesn’t mod_alias have enough to do?