| 6:18 pm on Feb 2, 2005 (gmt 0)|
I'm afraid not. Links to /index.htm and /index.html are treated as links to /
index.shtml, index.pl, etc. are treated literally.
You could have index.htm and InDeX.HtM but people would think you're just trying to look kewl. :-)
| 9:18 pm on Feb 2, 2005 (gmt 0)|
So then the old index.htm passed its PR to the new one right?
I've got a link from the new one back to the old one hoping that the spider will follow the links that were up on the old one, do you think it will?
| 9:21 pm on Feb 2, 2005 (gmt 0)|
My default page is default.htm. mysite.com/ shows up as a PR6. I also have an index.htm and an index.html and they both show up as 6. Strangely, if I go to mysite.com/default.htm it shows up as a 0.
| 10:02 pm on Feb 2, 2005 (gmt 0)|
You should never have a link to an index.html or index.htm file. You should always link to the directory to avoid any problems.
| 7:09 am on Feb 3, 2005 (gmt 0)|
|I also have an index.htm and an index.html and they both show up as 6 |
Macro, do the 2 pages have different content and are then both indexed differently by Google?
| 9:27 am on Feb 3, 2005 (gmt 0)|
> So then the old index.htm passed its PR to the new one right?
You could see it like that.
The easiest way to think about this is that Googlebot is seeing these:
* <A href="http://www.example.com/index.html">
* <A href="http://www.example.com/index.html">
* <A href="http://www.example.com">
and treating them all as <A href="http://www.example.com"> before following the links or assigning PageRank.
> should never have a link to an index.html or index.htm file
I agree with that but as far as Google is concerned now, they just count as links to / so it doesn't matter. It certainly can matter in non-Google contexts.
| 10:04 am on Feb 3, 2005 (gmt 0)|
Bobby, the site is #1 for the keyword on the default.htm page. However, searching for text exclusive to index.html doesn't show that page in SERPS.
Ouch, one more thing for the link exchangers and link buyers to watch out for!
| 1:36 pm on Feb 3, 2005 (gmt 0)|
Macro, /default.htm is not merged with / - as far as I know only /index.htm and index.html are.
So if /default.htm and / both have links to them it is normal for them to be listed separately, with different PageRank and backlinks. If Google then identifies them as being 100% duplicates they should be merged into one listing, which inherits the PR and backlinks of both. Note that Googlebot may visit the two different URLs at different times, so if there are frequent changes (such as a 'what's new' column or an automatically generated date) then they will not merge if the robot finds slightly different content.
| 6:25 pm on Feb 3, 2005 (gmt 0)|
I have noticed that Google will also merge index.php, but since php is dynamic by definition, they seem to take MUCH longer, and it has to be a very stable page with several crawls between updates.
| 8:41 pm on Feb 3, 2005 (gmt 0)|
I have a site PR 5. Nowhere in the site (or external to the site) are there links to /default.asp - but it has a PR 5.
I think you'll find Google can recognise an index page(i.e. index.php .asp, default.htm etc) and will just assign the same PR to all these pages that display homepage content. Perhaps I'm wrong...
| 8:55 pm on Feb 3, 2005 (gmt 0)|
My index.htm and index.html had different PRs last year. (site was defaulting to default.htm)
| 9:31 pm on Feb 3, 2005 (gmt 0)|
Google is a strange beast
| 11:12 am on Feb 4, 2005 (gmt 0)|
Google will merge index.php and default.asp and foo.bar if it finds the same content. This is a different process from not crawling index.html and index.htm
> index.htm and index.html had different PRs last year
That was before last September though? I don't know when Google stopped crawling those URLs, but it was before 1 September 2004.
| 5:03 pm on Feb 4, 2005 (gmt 0)|
Really? They don't even crawl */index.html or */index.htm anymore and just crawl */?!?!?
That's just stupid!
I can think of at least half a dozen ways to take advantage of this and be able to claim innocence. Hell, I could use both index.htm and index.html to double the effect.
| 7:18 am on Feb 5, 2005 (gmt 0)|
I'm keeping an eye on how Google reacts to the two files (index.html and index.htm), apparently it only recognizes the default one.
My real concern is whether or not the old pages which were linked to from index.htm will disappear from the SERPs as there are no links to those pages from the new index page (new structure and new pages).
At this point in time they are still present in the SERPs but that may be because as far as I know Google will not eliminate the files any time soon, which brings us to the question of how long does it take for an indexed file to disappear from the SERPs assuming there are no more links to the page.
|You should never have a link to an index.html or index.htm file. You should always link to the directory to avoid any problems. |
I don't know about you BigDave but when I develop a site I link to the index page from all other pages. On a local level it doesn't work to link to the folder as the page will not appear, only the folder. If I link to another site on the web I will link to the folder.
| 10:29 am on Feb 5, 2005 (gmt 0)|
BigDave, as I said:
|Ouch, one more thing for the link exchangers and link buyers to watch out for! |
I'm sure there are other pitfalls to this behaviour.
|That was before last September though? |
It was. I can't remember when I last saw different PRs but it was at least a year ago. Because they are showing up with the same TPR now I have no way of knowing.... wait! I do. There are no IBL to index.htm and index.html and they still went from PR5 to PR6. Hmmm.
| 10:35 am on Feb 5, 2005 (gmt 0)|
> That's just stupid!
I think I said something similar a while back, but not quite so succinctly. Remember the "using the removal tool to take out someone else's homepage" thing that was partly fixed some time ago? 1 + 1 = ...
> how Google reacts to the two files
Bobby, Google doesn't react to files on your webserver, but to URLs (and sometimes inappropriate assumptions of the underlying files). I know this seems obvious, but the difference between internal files and external URLs is crucial to this topic.
> whether or not the old pages which were linked to from index.htm will disappear from the SERPs
As long as / returns the same content as /index.htm, it doesn't matter. Where it is different, you can have a problem.
| 11:53 pm on Feb 5, 2005 (gmt 0)|
I have a quesiton that as of yet, I have no answer on how to fix the problem.
This topic was about duplicate content htm and html. My question is for the last week or so, Google has in their index 2 of each page on my site. First with wwwdotmysitedotcom and again as just mysitedotcom
It does not appear that my PR was split, but I do know that my wwwdotmysitedotcom comes up on certain keywords and the other just mysitedotcom comes up on other keywords.
I have always had 1 site. My site is over 3 years. I have always submitted my site as wwwdotmysitedotcom
I had e-mailed Google a few days ago explaining what I have found with no response yet.
Does anyone know how I can fix this problem?
Look forward to all advise..
| 1:22 pm on Feb 6, 2005 (gmt 0)|
I have close to same problem and my site has nearly disappeared from serp results as of feb 2. I'm looking for assistance too.
I used the removal tool to try to get rid of all occurances of index.htm.
Now I am left with only the supplemental listing and can't seem to get rid of it. Emails to google have been ignored.
Anyone have any suggestions?