Forum Moderators: phranque

Message Too Old, No Replies

New twist on redirecting index.htm to /

Problem with Adobe Contribute

         

sonjay

2:39 pm on Sep 10, 2007 (gmt 0)

10+ Year Member



I'm facing an issue with a client's site that I haven't run into before but probably will again.

The site has been around a long time. Its internal links previously used a mix of links to /index.htm and to just "/", and has many inbound links using both versions (including links to subdirectory index pages). I created a 301 redirect in .htaccess to redirect from index.htm to site root, and from the index page in subdirectories to the subdirectory root. I've done this before on many sites without a hitch. It works perfectly when a browser requests an index.htm page. No problem, right?

Problem.

The client uses Adobe's Contribute to edit the site, and Contribute uses a weird hybrid combination of ftp and http requests to connect to the site for editing. When the client pulls up the home page, or a page such as example.com/subdirectory/, Contribute uses ftp to determine that the filename of the page is really /subdirectory/index.htm and then issues an http request to that page. Which then gets redirected by Apache to /subdirectory/, which Contribute then re-requests as /subdirectory/index.html -- the classic infinite loop, except it only happens in Contribute because of the weird way Contribute works.

Unfortunately, this infinite loop makes it impossible for the client to edit any index pages in Contribute. Because of the site's history of linking to /index.html as well as /, and because of the IBLs that link to both versions, I would strongly prefer to keep the redirect in place, but I've had to remove it in order to allow the client to edit the site.

Here's the code I used in .htaccess:


RewriteEngine on
RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.htm\ HTTP/
RewriteRule index\.htm$ http://www.example.com/%1 [R=301,L]

I've also tried this:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://%{HTTP_HOST}/$1 [R=301,L]

Is there any solution for such a situation?

jdMorgan

10:59 pm on Sep 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If possible, grab the user-agent string from the client's raw access log file, and then create an exclusion for that user-agent using:

RewriteCond %{HTTP_USER_AGENT} !^adobe-contribute-user-agent-without-version-numbers

on your rules that you want to disable for contribute.

Jim

g1smd

11:29 pm on Sep 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One note about your index file redirect. The target URL should also fix the domain (as in your first code example) to avoid other Duplicate Content issues.

The second code example is dangerous because:
domain.com/index.html redirects to domain.com/
and
www.domain.com/index.html redirects to www.domain.com/

If you have a separate non-www to www redirect then you will have a Redirection Chain if domain.com/index.html is requested.

The index file redirect should always specify the target domain at the same time.

Dispense with the redirects only when you detect that the user is the editor; by detecting either the User Agent, or maybe the IP address from which the request comes.

sonjay

1:24 am on Sep 11, 2007 (gmt 0)

10+ Year Member



Thank you, Jim! That looks like exactly what I need.

g1smd, thanks for the tip about the 2nd version of the code. I do take care of the www/non-www issue in a separate redirect that comes before the index one. (I just neglected to mention it.) It's good know about the risk of the redirection chain.

g1smd

11:18 am on Sep 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ah, if you are fixing the www first then you DO have a redirection chain.

A request for domain.com/index.html will be redirected to www.domain.com/index.html and then that will be redirected to www.domain.com/. That is bad.

I oversimplified my initial explanation. Not only should the "index redirect" also sort out the correct domain in the same redirect, but the "index redirect" being more specific, should come before the more general "fix all my non-www" redirect.

This ensures index files have their domain fixed at the same time as fixing the index file URL. A separate redirect then runs for all non-index non-www URLs, and fixes those to all be www. Each starting point only runs through one redirect, not a chain.