Forum Moderators: phranque
I also changed most of the mydomain/ links to mydomain/index.htm and that is visited much more since the change.
> Any ill effects if I use this?
> Redirect 301 /index.html http*//www.my_site.com/
Well, you could put your server into a loop at worst. If index.html is requested, you'll redirect from index.html to "/". A 301 response will be sent back to the client saying "request that from '/'." So, the client will then request "/", and then DirectoryIndex (Apache mod_dir [httpd.apache.org]) kicks in and redirects that to index.html. IF there are any subrequests associated with fetching index.html, then the Redirect 301 (mod_alias) [httpd.apache.org] will kick in again, and the whole process starts over. Then you've got a loop.
You can get around this by renaming index.html to anything else -- like home.html, default.html, or index.htm - It just has to be different from the original existing filename. If you stray from the standard "home page" names defined in your server configuration, you may have to add a DirectoryIndex directive:
DirectoryIndex index.html index.htm whatever.html
Jim
My interest in this is that Googlebot is always requesting:
http*//my-domain.com/index.htmlrobots.txt
This obviously causes errors. Nowhere on my website is there a link (or even mention of) index.html, so I assume Googlebot is following remote links for index.html. Similarly, I have also seen errors which appear that users are typing page names after index.html:
http*//my-domain.com/index.htmlpage_name
I'm thinking that if all requests for http*//my-domain.com/index.html would redirect to http*//my-domain.com/ then this would solve the problem.
But, as you say, I risk the danger of the dreaded 'loop' (I shudder to even think of it!)
I wonder if there is a logical solution to this ongoing issue?
You might want to go through your pages, and make sure you don't have a misconfigured base href tag in the <head> section somewhere. This sounds like a really odd problem.
However, I think everything you need to avoid the loop is in my previous post. Besides, your case is even simpler, since you would not be redirecting "index.html" to "/" you would be redirecting ^index\.html(.+)$ to [yourdomain.com...] which is not going to be loop-prone.
Jim
<added>Use RedirectMatch 301 so you can do the backreference.</added>
RedirectMatch 301 ^/index\.html(.+)$ [domain.com...]
<added> I don't use base href </added>
<added><added> Tried both ways - neither one does anything at all. Must be the server config, which is most likely the culprit anyway! </added></added>
I have only one site to check this against but, gbot wasn't crawling my subdirectory index pages very often. After changing some of the links to mydomain/sub/index.htm gbot is visiting them several times a month now.
I also changed most of the mydomain/ links to mydomain/index.htm and that is visited much more since the change.
anyone else seeing similar behaviour?
/ seems so much cleaner, hate to switch back to .htm
++++
plumsauce, nancyb
Actually, there are a couple threads from times past about this subject. You might try the site search utility at the top of the page.
As I remember, most webmasters felt that Google did it's own redirecting with default pages, and that it really didn't matter whether the mark-up used relative (page.html) or full (mydomain.com/page.html) URLs, being that once the robot is in your domain, it pulls all the files in the same manner.
However, I think a few webmasters preferred full URLs for various scenarios.
I know that I've used both methods in the past and I personally do not think it matters either way, opting for the shorter succinct method.
I have read those posts and I also opted for the shorter method last year. But when I saw gbot visiting the sub index pages a lot less frequently after changing to the short version(although there didn't seem to be a change in the frequency of the other sub directory pages), I changed some of them back again to the full url.
It's just been a little over a month, so can't really judge if it is coincidence or a change of gbot behavior due to my changes - and - of course, I know I won't really know anyway :( but it is still nice to see gbot visiting those pages more often - I think.