Forum Moderators: phranque

Message Too Old, No Replies

Redirect non-www page to end with /

         

lee_sufc

9:36 am on May 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Following on from my other questions relating to HTACCESS, I have just one more, but thought it'd be better to start a new topic for this.

I currently redirect all traffic to the www. version of my site. However, upon checking redirected links in a specific folder, I've noticed something that may / may not be an issue.

Basically, if I do a header check on www.example.com/example/examplepage/ it shows 200 OK

If I check www.example.com/example/examplepage (without "/") it shows 301 redirect to the above as it should.

BUT, if I check example.com/example/examplepage (without www and "/"), it redirects to www.example.com/example/examplepage (without the "/" at the end)...which then in turn redirects to www.example.com/example/examplepage/ (the correct page).

I hope this makes sense?

a) is this OK? b) am I missing something in my htaccess?


RewriteOptions inherit

RewriteEngine On

RewriteCond %{THE_REQUEST} ^.*/index.htm

RewriteRule ^(.*)index.htm$ http://www.example.com/$1 [R=301,L]

RewriteCond %{HTTP_HOST} ^example\.com [NC]

RewriteRule (.*) http://www.example.com/$1 [R=301,L]


RewriteCond %{THE_REQUEST} \.php
RewriteRule ^tips/([^.]+)\.php$ http://www.example.com/example/$1/ [R=301,L]

RewriteRule ^tips/([^.]+[^./])$ http://www.example.com/example/$1/ [R=301,L]

RewriteRule ^(tips/[^.]+)/$ /$1.php [L]


Please note: the above php rule was created to "hide" php extensions on a specific folder (didn't want them hidden anywhere else).

topr8

9:50 am on May 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



think about combining rules

topr8

9:58 am on May 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



eg. do the 2 in one redirect


RewriteCond %{HTTP_HOST} ^example\.com [NC]
# redirect non-www tips to www + correct
RewriteRule ^tips/([^.]+)\.php$ http://www.example.com/example/$1/ [R=301,L]
# rewrite all other non-www to www
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

# carry on with the_request rules

lee_sufc

10:04 am on May 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks, topr8 - I'll take a look.

lee_sufc

11:05 am on May 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Aahh...thought I'd found the solution by moving part of the htaccess around, but nope....so infuriating...

lee_sufc

11:21 am on May 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm wondering if this is even an issue worth wondering about? I use "<link rel='canonical'" in the header of each page which shows the URL with the "/"....htaccess gives me a headache - it may as well be in another language :-(

Basically, I'm worried that having the two 301 redirects to get to the final page isn't a good idea: ie: redirecting http://example.com/example/examplepage to http://www.example.com/example/examplepage "with the www" and then again to http://www.example.com/example/examplepage/ (with the www and trailing slash at the end?

lee_sufc

1:26 pm on May 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



My God, I think I've done it...well...that is until I find something else I've missed. In case this helps anyone else in the same boat, this is what got it working (just involved moving some of the rules around in htaccess). So now, my htaccess shows the following to ensure all redirects work properly:


RewriteOptions inherit
RewriteEngine On

RewriteCond %{THE_REQUEST} ^.*/index.htm

RewriteRule ^(.*)index.htm$ http://www.example.com/$1 [R=301,L]

RewriteCond %{THE_REQUEST} \.php

RewriteRule ^tips/([^.]+)\.php$ http://www.example.com/example/$1/ [R=301,L]

RewriteRule ^tips/([^.]+[^./])$ http://www.example.com/example/$1/ [R=301,L]

RewriteRule ^(tips/[^.]+)/$ /$1.php [L]

RewriteCond %{HTTP_HOST} ^example\.com [NC]

RewriteRule (.*) http://www.www.example.com/$1 [R=301,L]


If anyone can see anything wrong with this or anything that could potentially cause problems; please let me know...but before hand, let me take something to calm my nerves as I'm not sure how much more of htaccess I can take! Again, I appreciate everyone's help from this, and my other couple of posts.

lee_sufc

3:10 pm on May 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



*insert crying face*

Nope - spoke to soon...still not working on http://example.com/example/examplepage/

This STILL redirects to http://www.example.com/example/examplepage.php and then again to http://www.example.com/example/examplepage/

I swear when I checked earlier it worked?!? This must be driving me crazy.

I suppose, going back to my earlier question - does it matter that one that one specific redirect (without www but with /), there's an extra 301 step to get the final destination. As much as I play around with it, I just can't seem to get it to redirect in one.

phranque

9:06 pm on May 1, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Order your rules so that the external redirects precede the internal rewrites and then the most specific rules precede the most general rules.

phranque

9:13 pm on May 1, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The first two RewriteCond directives are redundant.
The pattern match of the URL path in the RewriteRule will fire first making the additional conditional unnecessary.

lucy24

9:53 pm on May 1, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



and then the most specific rules precede the most general rules

Let me give a little more detail here, because I believe eight different people have already pointed out it's a rule-ordering problem ;)

External redirects ([R] flag) ordinarily go in this order (#3 will not apply to all sites)

#1 redirects that apply to specific pages

#2 index redirect (requests for index.html, index.php or whatever applies to your site)

#3 extension-related redirects, such as .php to extensionless, or without final slash to with final slash (other than real, physical directories)

#4 domain-name canonicalization and/or protocol (with/without www, http vs https). This is always the very last external redirect

All redirect targets should give the full correct hostname, so #4 is purely a catchall for requests that are perfectly correct in every way except that they're using the wrong form of your hostname (and/or wrong protocol, if applicable).

I recommend leaving a blank line after each RewriteRule, but NO blank line between a rule and its accompanying RewriteCond. The server doesn't care, but it makes it much easier for humans to read--and also makes it more likely that each ruleset stays together when you're rearranging.

lee_sufc

10:56 pm on May 1, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



I won't throw a party just yet but I think that thanks to everyone's help here, it's now working as it should!

In relation to a previous post about some lines being redundant, if I removed anything, the redirects didn't work. In the end, what did work was this:


RewriteEngine On

RewriteCond %{THE_REQUEST} \.php

RewriteRule ^tips/([^.]+)\.php$ http://www.example.com/tips/$1/ [R=301,L]

RewriteRule ^tips/([^.]+[^./])$ http://www.example.com/tips/$1/ [R=301,L]

RewriteCond %{THE_REQUEST} ^.*/index.htm
RewriteRule ^(.*)index.htm$ http://www.example.com/$1 [R=301,L]

RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteRule ^(tips/[^.]+)/$ /$1.php [L]


What has made this difficult for me is that I do not understand re-write rules / htaccess at all - it may as well be in Chinese. Even with everyone here kindly helping me, I often look back and just think "huh?"

I used to have a web developer employed to help me, but he messed things up worse than I would have and ever since I've never trusted anyone else to tweak anything on the website.

Anyway, thanks AGAIN to everyone who has helped here. I suppose the one upside to my multiple posts / questions recently is that it might help someone else in the future in a similar conundrum!

I will be glad to not have to spent anymore time staring at my htaccess folder now!

lucy24

2:36 am on May 2, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I do not understand re-write rules / htaccess at all - it may as well be in Chinese

Welcome to the club. People who have been here for a long time already know my particular dirty secret, which is: I don't speak a word of Apache. What I do know, thanks to massive text-editing experience, is Regular Expressions. Fortunately, at least 90% of mod_rewrite questions are really just about how to construct the perfect RegEx.

The quirk about Apache, as opposed to just about any other computer-related language you can name, is that it doesn't have operators. No + signs or = signs or {put-stuff-in-brackets-and-do-stuff-to-it} or {if A then B else C}. Instead, every blank space in every line carries meaning. When you meet something like
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
you can think of it as four cells in a table. (This is assuming you know at least a tiny bit of html.) And then you think of the whole "table" as having a header:
<th>who's in charge (“RewriteRule” tells you it's mod_rewrite)</th>
<th>what you're looking for (the "pattern")</th>
<th>what you're doing to it (the "target")</th>
<th>extra stuff to make the server happy</th>
and the only sign that you've moved from column 1 to column 2 to column 3 in your hypothetical table is those blank spaces.

:: :: ::

I've just put an 800-page book into the marinade after ignoring it for several years. By comparison, everything else in the world looks blissfully simple and straightforward.

phranque

9:39 am on May 2, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



i would delete those first two RewriteCond directives:
RewriteCond %{THE_REQUEST} \.php
RewriteRule ^tips/([^.]+)\.php$ http://www.example.com/tips/$1/ [R=301,L]

this says if the url path ends with .php make sure THE_REQUEST also contains .php before redirecting the request.
try this instead:
RewriteRule ^tips/([^.]+)\.php$ http://www.example.com/tips/$1/ [R=301,L]


RewriteCond %{THE_REQUEST} ^.*/index.htm
RewriteRule ^(.*)index.htm$ http://www.example.com/$1 [R=301,L]

this says if the url path ends with index.htm make sure THE_REQUEST starts with nothing-or-anything followed by /index.htm before redirecting the request.

try this instead:
RewriteRule ^(.*)index\.htm$ http://www.example.com/$1 [R=301,L]

(also note the added backslash in the Pattern regex)


i would suggest this change to your hostname canonicalization redirect ruleset:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

lee_sufc

9:51 am on May 2, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks, phranque.

If I change those first two lines at all, it messes everything up.

In relation to the index.htm redirect to / and canonicalization - these both work with your rules and look good. Could I just ask why these rules are better than what I already had? (sorry - this just helps me understand better).

phranque

1:04 pm on May 2, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If I change those first two lines at all, it messes everything up.

what response to you get?

In relation to the index.htm redirect to / ... Could I just ask why these rules are better than what I already had?

the dot is a special character in a regular expression which means any character.
by preceding it with a backslash that makes it a literal dot.

RewriteCond %{HTTP_HOST} ^example\.com [NC]

this fires the RewriteRule if HTTP_HOST is exactly example.com (case insensitive)

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]

this fires the RewriteRule if HTTP_HOST is not exactly www.example.com (case insensitive) or nothing at all.

the "not exactly the canonical hostname" test is more robust than "this exact noncanonical hostname"
the nothing at all case is to avoid a chained/infinite redirect for HTTP/1.0 user agent requests which don't supply a hostname.

lee_sufc

1:09 pm on May 2, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks! In response to the first question - by changing those rules, it suddenly comes up with a 404 error on any directed pages (the redirects work but the pages come up as 404).

phranque

1:25 pm on May 2, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It redirects to the same URL but the subsequent request has a different response?

if so it would be interesting to understand why...

lee_sufc

1:31 pm on May 2, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



hmm...I'd love to know why. Leaving the rules as mentioned earlier, everything works fine?!?

lucy24

6:22 pm on May 2, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



this says if the url path ends with .php make sure THE_REQUEST also contains .php before redirecting the request.

It has to, because otherwise you get an infinite loop when there's an internal request for .php files--which, after all, is what the entire site is built around. Unlike index redirects, it isn't covered by the [NS] flag.

try this instead:

RewriteRule ^(.*)index\.htm$ http://www.example.com/$1 [R=301,L]

Oh, phranque, this time I have to vigorously disagree. A leading .* or .+ is an absolute last resort, because the server only works in one dimension; it doesn't know that "index.htm" is coming up, so it has already captured all the way to the end and then has to backtrack. And all this for a capture that, on the overwhelming majority of requests, will end up being thrown away.

Lee, have you ever at any time had URLs in "index.htm"? If no, there's really no reason for this rule at all; search engines won't make the request, and obviously humans won't. Search engines might request "index.html" (the default filename) for entrapment purposes, and here it gets tricky because the [NS] flag won't work if your real index files are called "index.php". But on 999 requests out of 1000--at a minimum--the capture will end up getting thrown away. So this is the are case which it may be more efficient to defer the capture until the condition--in fact, to create a Condition for the sole purpose of capturing from it. If the server never gets as far as evaluating a condition, no capture takes place. If you want to play it safe you can express the pattern as "index\.htm" without closing anchor, and then it works both ways.

:: detour to check something in raw logs ::

Thought so. I have never, ever received a legitimate request for "index.htm" on any site. (In fact, since I've never used .htm, the only .htm requests at all were from assorted malign robots.) Requests for "index.php" are slightly more common--but those, of course, are from malign robots hoping to find CMS files. Oh, and piwik, which really does use index.php. Other than that, no humans, no legitimate search engines. Requests for "index.html" do occur, even if the site in question has never used "index.html" in visible URLs. (This is one of the things I had to check.)

Well, I do get some "index.php" requests from the BLEXBot. But since this robot is notoriously inept at managing its database, it is impossible to tell whether these are honest requests, or someone else's URL getting attached to my filepaths. Shrug.