Forum Moderators: phranque
I've considered two things so far:
Search Engines:
After renaming my index page, I visited it with the Google toolbar. Keeping in mind that my index page has had the full html extension for years now, I was glad to see that the PR for the page was the same, whether I just went to mydomain.com, or mydomain.com/index.htm. I'm assuming, therefore, that Google has no separate record for index.html as opposed to any other index page. Is this correct, and if so, will this also be true for other search engines?
Links from other sites
I'm pretty sure nobody has linked specifically to mydomain.com/index.html. It would be far more natural for them to just link to mydomain.com. But if they have, then visitors (or SE's) that follow that link will get a 404 error instead of my site. As I said, I'm pretty sure this won't be much of a problem, but is it something I should be concerned about?
The Solution?
I tried adding the following to my .htaccess file... <edit>(Spaces added between http :// beause WW was automatically making a link out of it.)</edit>
Redirect 301 /index.html http ://mydomain.com/index.htm ...but it had the negative effect of appending 'index.htm' to the end of the address, regardless of if mydomain.com or mydomain.com/index.html was requested. So I tried something else...
RewriteRule ^index\.html$ http: //mydomain.com [R=301] This seems to be closest to what I want. If someone comes to mydomain.com, that's what shows in the address bar. If someone comes to mydomain.com/index.html, that also is what shows in the address bar, but they're served the index.htm page. The only thing I can think of that would be better than this would be to come up with some sort of rewrite rule that would strip off the index.html if people tried to visit that explicitly, and serve index.htm. Is there any way to do that?
Am I missing anything?
Are there any considerations to this that I haven't thought of? It's not a step I've taken rashly - there's a good reason for changing the file extension in this case. I just want to make sure I've come up with the best possible solution and that I'm not overlooking anything important.
Thoughts, suggestions, and constructive criticism are all welcome! ;)
Thanks,
Matthew
I found, however, SOME requests always end up going to index.html, I don't know why but they just do that.
As for me, my solution is not much better... In my case, I keep a copy of both index.htm AND index.html, whenever I make a change, I save and Save As and upload both...
Nothing on my site links to an index file, always link to the directory, seeing as index is default anyhow.
But, I'd like to hear the solution, too.
[webmasterworld.com...]
There are a few others in the Apache forum which may give other solutions. The problem with your current solution is that you still have the potential that the same page is available with 3 different URIs:
http://www.example.com/ http://www.example.com/index.htm http://www.example.com/index.html (Assuming you are redirecting non-www to www or vice-versa, else you have several more combinations). To avoid indexing problems by the search engines it is best to stick to one URI per resource. You have already done the right thing by not explicitly linking to the index.? page but to be certain you should try removing any index.html requests and 301 redirect them to the root or folder name.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ http://mydomain.com/$1 [R=301]
...and it worked perfectly. So, I figured I'd do the same thing with the index.htm page:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.htm$\ HTTP/
RewriteRule ^(.*)index\.htm$ http://mydomain.com/$1 [R=301]
...but that doesn't work. Going to mydomain.com/index.htm still leaves the index.htm portion in the address bar. Any ideas why that could be? I don't know these rewrite things well enough to recognize potential problems.
<edit> Naturally, I figure it out right after posting...For some reason I added the '$' at the end of the RewriteCond line for the htm rewrite. Taking that off solved the problem and now index.html and index.htm get stripped off. </edit>
Regarding any negative consequences to renaming the index page, I do always link to a directory, not the index.? page, so I don't have any internal links to worry about (unless there's one or two hanging around from years ago that I've managed to overlook so far.) Also, the very first rewrite rule in my .htaccess file strips off the w's from the address, assuming somebody uses them.