Forum Moderators: phranque

Message Too Old, No Replies

www/https rewrite with SEF question

         

benmcntsh

6:14 am on Sep 1, 2008 (gmt 0)

10+ Year Member



Hello,

I came across this website and I'm very impressed at how responsive the community is on the forums. I am hoping you can help me because I don't know who else to ask.

I am fairly new to .htaccess commands, but I am trying to do some rewrites and they don't quite work with each other. Each one is great on their own, but there is a subtle problem somewhere. Let me show you the relevant portion of my .htaccess file first:

DirectoryIndex index.html index.php

RewriteEngine on

# force www and https in URL
RewriteCond %{HTTP_HOST} ^example\.com$ [NC,OR]
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://www.example.com/$1 [R=301]

# Begin SEF Section
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index.php
RewriteCond %{REQUEST_URI} (/¦\.php¦\.html¦\.htm¦\.feed¦\.pdf¦\.raw¦/[^.]*)$ [NC]
RewriteRule (.*) index.php
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

So as you can see I am trying to do 3 things: use index.html as an index before index.php, force the www and https to show up on the URL, and implement a SEF url (I have no flexibility on this as my website package needs it).

For the most part it works great. But here is where it is inconsistent:

  • https://www.example.com --> Shows index.html (GOOD)
  • http://www.example.com --> Redirects to [example.com...] (BAD)
  • https://example.com --> Redirects to [example.com...] (BAD)

Does anyone know how I can get the index.html page by default when visiting my website using any of the listed URLs? The RewriteRule (.*) index.php line is causing the problem, but I'm not sure how to fix it.

Thanks so much,
Ben

[edited by: jdMorgan at 8:36 pm (utc) on Sep. 1, 2008]
[edit reason] Please use example.com [/edit]

jdMorgan

8:49 pm on Sep 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Base on only a quick look, I'd suggest replacing "index.php" with "index.html" in the rules, along with several other tweaks, most notably the [L] flag on the first rule:

# force www and https in URL
RewriteCond %{HTTP_HOST} ^example\.com$ [NC,OR]
RewriteCond %{HTTPS} off
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
#
# Begin SEF Section
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} (/¦\.php¦\.html¦\.htm¦\.feed¦\.pdf¦\.raw¦/[^.]*)$ [NC]
RewriteRule !^index.html$ index.html [L]
#
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]

I don't know what your last rule is doing. It may ne necessary to move it above the second rule and remove the [L] flag from it.

An off-topic comment: Many savvy Webmasters are changing their linking, doing rewrites, and adding redirects so that default page paths such as "index.php" and "index.html" do not show up in on-page links, in the address bar, or in search engine results. Including that path in your link URLs means that if you change your site technology in the future, you will have to redirect all index pages and suffer a temporary loss in search rankings. Plus, including that path info is not necessary, and makes the URLs longer and harder to read and type.

I suggest linking to "/" and letting DirectoryIndex internally rewrite that to the index.xyz files., then adding redirects if needed to get the old index.xyz URLs out of the search listings faster. However, I also suggest getting things working one step at a time, rather than making a bunch of changes all at once, as this unnecessarily complicates debugging.

Note that posting on this forum modifies the solid pipe character, changing them to broken "¦" pipes. You will need to change them back to solid pipes before using any code copied from this forum.

Jim

benmcntsh

2:47 am on Sep 2, 2008 (gmt 0)

10+ Year Member



Thank you for your excellent comments!

I started making the changes that you recommended and the website started behaving like I expected!

All I did was add on the 'L' to my first rewrite command and now:

  • https://www.example.com --> Shows index.html (GOOD)
  • http://www.example.com --> Redirects to [example.com...] and shows index.html (GOOD)
  • https://example.com --> Redirects to [example.com...] and shows index.html (GOOD)

And without the 'L':

  • https://www.example.com --> Shows index.html (GOOD)
  • http://www.example.com --> Redirects to [example.com...] (BAD)
  • https://example.com --> Redirects to [example.com...] (BAD)

Do you have any insight as to why that happens?

Thanks again,
Ben

jdMorgan

4:19 pm on Sep 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Without [L] as in your originally-posted code, the redirect in the first rule is deferred until the second and third rules run. This "exposes" your internal script path to the client, because the second rule's rewrite is applied, the third rule is executed (ending with [L]), and then the redirect occurs.

Always use the [L] flag unless you know of a specific reason you do not want to.

Jim