Forum Moderators: phranque
index.php
index.htm
index.html to the root directory.
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301.L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html\ HTTP/
RewriteRule ^index\.html$ [%{HTTP_HOST}...] [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ [%{HTTP_HOST}...] [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.htm\ HTTP/
RewriteRule ^index\.htm$ [%{HTTP_HOST}...] [R=301,L]
This was working until recently and now it appears that all three of Google, Yahoo and MSN are no longer able to figure this out.
Yahoo is showing index.php
MSN is completely excluding
/
index.php
index.htm
index.html
but showing other pages of the site.
Google is now completely excluding the entire site.
Is there anything wrong with the above code?
TIA
Yes, the use of %{HTTP_HOST} may be causing you problems. Yahoo especially will often request pages from your domain without the leading 'www'. If they were to do that as well as requesting "/index.html" (or one of the other index file variants), then they would see two consecutive 301 redirects. This may not be the root cause of your indexing problem, but it would certainly complicate recovery from another problem.
Because the major search engines have over-complicated their 301-handling in an effort to 'help' Webmasters on second-rate hosting and those who don't know enough to return correct server headers, they're having enough trouble with one 301 redirect, much less two...
So it would be better to explicitly state the domain name in your rules, and to do the index file redirect first, since it will then also correct the domain name and prevent a double redirect in the case described above.
Also, you can use a local 'OR' and a regex "?" token to reduce your ruleset from four rules to two:
RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.(php¦html?)\ HTTP/
RewriteRule ^index\.(php¦html?)$ http://www.example.com/ [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301.L]
As a matter of procedure, I recommend thoroughly testing all mod_rewrite code using the Mozilla "Live HTTP Headers" headers checker extension for Firefox/Mozilla/SeaMonkey, or *a good* on-line server headers checker. My definition of a good online server headers checker is one that shows all responses in the transaction(s), including intermediate redirects. Not all server headers checkers do this, and many omit the less-common response headers. Therefore, a good server headers checker is, by my definition, one that agrees with my "Live HTTP Headers" report. :)
You should probably also test your server robots.txt and page access using a user-agent spoofing extension or something like the WannaBrowser online utility. Spoof Google, Yahoo, and MSN robots (copy the user-agent strings from your raw server logs) and make sure that robots.txt and your pages are accessilbe and correct when presented to spiders.
Jim