Forum Moderators: phranque
The site has been around a long time. Its internal links previously used a mix of links to /index.htm and to just "/", and has many inbound links using both versions (including links to subdirectory index pages). I created a 301 redirect in .htaccess to redirect from index.htm to site root, and from the index page in subdirectories to the subdirectory root. I've done this before on many sites without a hitch. It works perfectly when a browser requests an index.htm page. No problem, right?
Problem.
The client uses Adobe's Contribute to edit the site, and Contribute uses a weird hybrid combination of ftp and http requests to connect to the site for editing. When the client pulls up the home page, or a page such as example.com/subdirectory/, Contribute uses ftp to determine that the filename of the page is really /subdirectory/index.htm and then issues an http request to that page. Which then gets redirected by Apache to /subdirectory/, which Contribute then re-requests as /subdirectory/index.html -- the classic infinite loop, except it only happens in Contribute because of the weird way Contribute works.
Unfortunately, this infinite loop makes it impossible for the client to edit any index pages in Contribute. Because of the site's history of linking to /index.html as well as /, and because of the IBLs that link to both versions, I would strongly prefer to keep the redirect in place, but I've had to remove it in order to allow the client to edit the site.
Here's the code I used in .htaccess:
RewriteEngine on
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.htm\ HTTP/
RewriteRule index\.htm$ http://www.example.com/%1 [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://%{HTTP_HOST}/$1 [R=301,L]
Is there any solution for such a situation?
The second code example is dangerous because:
domain.com/index.html redirects to domain.com/
and
www.domain.com/index.html redirects to www.domain.com/
If you have a separate non-www to www redirect then you will have a Redirection Chain if domain.com/index.html is requested.
The index file redirect should always specify the target domain at the same time.
Dispense with the redirects only when you detect that the user is the editor; by detecting either the User Agent, or maybe the IP address from which the request comes.
A request for domain.com/index.html will be redirected to www.domain.com/index.html and then that will be redirected to www.domain.com/. That is bad.
I oversimplified my initial explanation. Not only should the "index redirect" also sort out the correct domain in the same redirect, but the "index redirect" being more specific, should come before the more general "fix all my non-www" redirect.
This ensures index files have their domain fixed at the same time as fixing the index file URL. A separate redirect then runs for all non-index non-www URLs, and fixes those to all be www. Each starting point only runs through one redirect, not a chain.