Forum Moderators: phranque

Message Too Old, No Replies

Help with .htaccess

         

darwinstudios

8:45 pm on Feb 8, 2010 (gmt 0)

10+ Year Member



I am running a blog site that was converted over from a static html site. I have an htaccess file in place to 1) rewrite posts, 2) redirect old pages to new pages.

My problem is that everything works fine except for simply redirecting www.example.com/index.html to www.example.com/

Here is what my htaccess file looks like, can you help me fix it? I wonder if it has something to do with the first rewrite rule for index.php?

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>


RewriteRule ^index\.html$ http://www.example.com/ [R,L] - This is the one that does not work
RewriteRule ^page1\.html$ http://www.example.com/page1/ [R,L]
RewriteRule ^page2\.html$ http://www.example.com/ [R,L]
RewriteRule ^page3\.html$ http://www.example.com/page3/ [R,L]
RewriteRule ^page4\.html$ http://www.example.com/page4/ [R,L]
RewriteRule ^page5\.html$ http://www.example.com/page5/ [R,L]

[edited by: jdMorgan at 2:51 pm (utc) on Feb 9, 2010]
[edit reason] example.com [/edit]

g1smd

9:24 pm on Feb 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your code produces a 302 redirect. Is that what you want?

List the redirects before the rewrite, otherwise you expose internal filepaths back out into URLs.

Your index rule needs to check that the request is from an external client. Use a RewriteCond to test THE_REQUEST. There's code for that posted several times per week in this forum.

You can redirect all the numbered pages using a single rule. The pattern changes to:
^page[0-9]{1,2}\.html$
(assuming always a single or double digit number).

darwinstudios

1:35 pm on Feb 9, 2010 (gmt 0)

10+ Year Member



I realized this did not look right in the post so I am putting it here again. <snip> I also modified it a bit as the page names are not really numbered it was just for illustrative purposes:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
#
RewriteRule ^index\.html$ http://www.example.com/ [R,L] - This is the one that does not work
RewriteRule ^aboutus\.html$ http://www.example.com/about-my-company/ [R,L]
RewriteRule ^testimonials\.html$ http://www.example.com/ [R,L]
RewriteRule ^portfolio\.html$ http://www.example.com/company-portfolio/ [R,L]
RewriteRule ^services\.html$ http://www.example.com/what-we-offer/ [R,L]
RewriteRule ^contact\.html$ http://www.example.com/contact-us/ [R,L]

[edited by: jdMorgan at 2:53 pm (utc) on Feb 9, 2010]
[edit reason] Use example.com only. [/edit]

jdMorgan

2:57 pm on Feb 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As g1smd stated above, you must check THE_REQUEST to prevent a loop when this rule interacts with mod_dir.

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.html[^\ ]*\ HTTP/
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

g1smd also pointed out several other important points. I strongly suggest that you investigate their meaning, or simply delete all of this code and resolve not to use mod_rewrite. Otherwise, you'll badly damage your site's rankings.

Jim

darwinstudios

3:15 pm on Feb 9, 2010 (gmt 0)

10+ Year Member



Thank you both for your replies, they are very much appreciated. I am not well versed at all with .htaccess as you can tell (and why I came here for the expert advice!)

I hacked this together from various places to just try and make it work. I would love to learn the "correct" way to do this. What should I do to not damage my site's rankings?

I tried adding the RewriteCond code and it didn't work, so now my .htaccess looks like this:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
#
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.html[^\ ]*\ HTTP/
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

RewriteRule ^index\.html$ http://www.example.com/ [R,L] - This is the one that does not work
RewriteRule ^aboutus\.html$ http://www.example.com/about-my-company/ [R,L]
RewriteRule ^testimonials\.html$ http://www.example.com/ [R,L]
RewriteRule ^portfolio\.html$ http://www.example.com/company-portfolio/ [R,L]
RewriteRule ^services\.html$ http://www.example.com/what-we-offer/ [R,L]
RewriteRule ^contact\.html$ http://www.example.com/contact-us/ [R,L]

jdMorgan

3:27 pm on Feb 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You left the old rule in place, which is why it doesn't work.

Please re-read g1smd's post, and then ask *specific* questions.

Thanks,
Jim

darwinstudios

6:48 pm on Feb 10, 2010 (gmt 0)

10+ Year Member



Thanks for your help guys, I paid someone to do this for me since I couldn't quite figure it out myself. I appreciate the responses I got for this though.

g1smd

8:02 pm on Feb 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let's see the final code. There are many ways to write code that looks like it works, but which is killing all possibility for your site to do well.

darwinstudios

8:28 pm on Feb 10, 2010 (gmt 0)

10+ Year Member



RewriteEngine On
RewriteBase /

#Index Page
RewriteRule ^index.html$ http://www.example.com/ [R=301,NC,L]
RewriteRule ^index.htm$ http://www.example.com/ [R=301,NC,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]
Rewritecond %{HTTP_HOST} !^www\.
Rewriterule ^(.*) http://www.example.com/$1 [R=301,L]
#end

RewriteRule ^page1\.html$ http://www.example.com/page1/ [R=301,NC,L]
RewriteRule ^page2\.html$ http://www.example.com/page2/ [R=301,NC,L]
RewriteRule ^page3\.html$ http://www.example.com/page3/ [R=301,NC,L]
RewriteRule ^page4\.html$ http://www.example.com/ [R=301,NC,L]

<IfModule mod_rewrite.c>
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

jdMorgan

11:35 pm on Feb 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This should speed things up... possibly so you'll notice if your site gets a lot of traffic.

RewriteEngine On
#
RewriteRule ^index\.html?$ http://www.example.com/ [R=301,NC,L]
RewriteRule ^page1\.html$ http://www.example.com/page1/ [R=301,NC,L]
RewriteRule ^page2\.html$ http://www.example.com/page2/ [R=301,NC,L]
RewriteRule ^page3\.html$ http://www.example.com/page3/ [R=301,NC,L]
RewriteRule ^page4\.html$ http://www.example.com/ [R=301,NC,L]
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} !^/(index\.php|robots\.txt|sitemap\.xml)$
RewriteCond %{REQUEST_URI} !\.(gif|jpe?g|css|js|png|ico)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

Jim

darwinstudios

1:00 pm on Feb 11, 2010 (gmt 0)

10+ Year Member



Thank you Jim for optimizing this! I will put it in place. One question, what do these 2 lines do?:

RewriteCond %{REQUEST_URI} !^/(index\.php|robots\.txt|sitemap\.xml)$
RewriteCond %{REQUEST_URI} !\.(gif|jpe?g|css|js|png|ico)$

jdMorgan

1:54 pm on Feb 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They speed up your WordPress- or Joomla-based site by by-passing the file- and directory-exists check RewriteConds for files and filetypes that WP and Joomla don't handle anyway, and for requests just previously rewritten to index.php by this rule itself (mod_rewrite in a per-directory .htaccess context behaves recursively -- it "loops" until a complete pass is made during which no more RewriteRules are invoked).

By skipping these unnecessary disk checks, we eliminate *a lot* of wasted server resources, and the result is a performance improvement of over 50% on the speed of this rule itself -- You get at least that much just by skipping the checks after the URL has already been rewritten to /index.php, which is why that exclusion is the very first one. And obviously, robots.txt and sitemap.xml and the image types are not script-generated files.

You can likely get small incremental improvements --up to a point-- by adding a few more exclusions (e.g. mutimedia and document filetypes such as .swf and .pdf), based on your knowledge of your site... Basically, these are files and filetypes that are known to exist or known not to be handled/generated by your CMS. They can be grouped as shown, or listed according to strict frequency-of-access order -- which is why the filetype list is ordered as shown (check your own "stats" to verify this order).

You can add a few more exclusions if you like, but the biggest gains should already be seen by excluding the rewrite target file itself and the image, css, and external JavaScript files.

And of course, this exclusion list is "just a guess." It should work fine on most WP/Joomla sites, but I know nothing of any of the possible 'special' functional details of your site, so you may have different needs.

Anyway, if you have a high-traffic sites hosted on a busy shared virtual server, you should notice that your page loads are a bit 'snappier' with this simple tweak.

Jim

jdMorgan

2:04 pm on Feb 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One more thing... If your site exists as the single "www" subdomain, and you have no immediate plans to add more subdomains, then you can simplify the domain canonicalization rule and make it more robust by changing it to

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

That simply says, "If the requested hostname is not exactly www.example.com or blank, redirect to www.example.com". It will handle all possible variations on the canonical hostname, such as FQDN format, appended port numbers or the (rare) mis-casing issues. The exclusion for blank hostnames is needed for IP-based virtual servers, which can be reached by true HTTP/1.0 requests which do not include an "HTTP Host" header, leaving the %{HTTP_HOST} variable blank. Failure to handle this specific case would result in an 'infinite' redirection loop -- something to be avoided. Even if you're currently hosted on a name-based virtual server --unreachable using true HTTP/1.0 protocol-- this can be seen as "inexpensive future-proofing" in case you later upgrade.

Jim