Welcome to WebmasterWorld Guest from 18.104.22.168
I changed the index page and put it up as index.html where it used to be index.htm.
Previously there had never been an index.html
The server defaults to index.html, but I'm wondering if I can keep the old index.htm up there anyway.
Would it be treated as simply another page?
You could see it like that.
The easiest way to think about this is that Googlebot is seeing these:
* <A href="http://www.example.com/index.html">
* <A href="http://www.example.com/index.html">
* <A href="http://www.example.com">
and treating them all as <A href="http://www.example.com"> before following the links or assigning PageRank.
> should never have a link to an index.html or index.htm file
I agree with that but as far as Google is concerned now, they just count as links to / so it doesn't matter. It certainly can matter in non-Google contexts.
So if /default.htm and / both have links to them it is normal for them to be listed separately, with different PageRank and backlinks. If Google then identifies them as being 100% duplicates they should be merged into one listing, which inherits the PR and backlinks of both. Note that Googlebot may visit the two different URLs at different times, so if there are frequent changes (such as a 'what's new' column or an automatically generated date) then they will not merge if the robot finds slightly different content.
I think you'll find Google can recognise an index page(i.e. index.php .asp, default.htm etc) and will just assign the same PR to all these pages that display homepage content. Perhaps I'm wrong...
> index.htm and index.html had different PRs last year
That was before last September though? I don't know when Google stopped crawling those URLs, but it was before 1 September 2004.
That's just stupid!
I can think of at least half a dozen ways to take advantage of this and be able to claim innocence. Hell, I could use both index.htm and index.html to double the effect.
My real concern is whether or not the old pages which were linked to from index.htm will disappear from the SERPs as there are no links to those pages from the new index page (new structure and new pages).
At this point in time they are still present in the SERPs but that may be because as far as I know Google will not eliminate the files any time soon, which brings us to the question of how long does it take for an indexed file to disappear from the SERPs assuming there are no more links to the page.
You should never have a link to an index.html or index.htm file. You should always link to the directory to avoid any problems.
I don't know about you BigDave but when I develop a site I link to the index page from all other pages. On a local level it doesn't work to link to the folder as the page will not appear, only the folder. If I link to another site on the web I will link to the folder.
Ouch, one more thing for the link exchangers and link buyers to watch out for!
I'm sure there are other pitfalls to this behaviour.
That was before last September though?
It was. I can't remember when I last saw different PRs but it was at least a year ago. Because they are showing up with the same TPR now I have no way of knowing.... wait! I do. There are no IBL to index.htm and index.html and they still went from PR5 to PR6. Hmmm.
I think I said something similar a while back, but not quite so succinctly. Remember the "using the removal tool to take out someone else's homepage" thing that was partly fixed some time ago? 1 + 1 = ...
> how Google reacts to the two files
Bobby, Google doesn't react to files on your webserver, but to URLs (and sometimes inappropriate assumptions of the underlying files). I know this seems obvious, but the difference between internal files and external URLs is crucial to this topic.
> whether or not the old pages which were linked to from index.htm will disappear from the SERPs
As long as / returns the same content as /index.htm, it doesn't matter. Where it is different, you can have a problem.
I have a quesiton that as of yet, I have no answer on how to fix the problem.
This topic was about duplicate content htm and html. My question is for the last week or so, Google has in their index 2 of each page on my site. First with wwwdotmysitedotcom and again as just mysitedotcom
It does not appear that my PR was split, but I do know that my wwwdotmysitedotcom comes up on certain keywords and the other just mysitedotcom comes up on other keywords.
I have always had 1 site. My site is over 3 years. I have always submitted my site as wwwdotmysitedotcom
I had e-mailed Google a few days ago explaining what I have found with no response yet.
Does anyone know how I can fix this problem?
Look forward to all advise..
I have close to same problem and my site has nearly disappeared from serp results as of feb 2. I'm looking for assistance too.
I used the removal tool to try to get rid of all occurances of index.htm.
Now I am left with only the supplemental listing and can't seem to get rid of it. Emails to google have been ignored.
Anyone have any suggestions?