Forum Moderators: open
I currently run a Cobalt RAQ4 and have a question about html file extensions and Google:
Say I create an html file in the webroot (for example, www.domain.com/test.html). When I go to a browser and type in "www.domain.com/test" the page comes up. OK, so for some reason the server checks and decides that since there is no directory called "test," I must be looking for "test.html". Great.
Now, here is my question...
Google comes along and indexes my site and and follows (and indexes) test.html. Then someone else comes along and links to my site via "www.domain.com/test". Does Google follow THEIR link and then index this thinking it's a subdirectory and I now (potentially) have a duplicate content problem? Does Google know that it's *really* test.html even though my server is serving the page when "www.domain.com/test/" is requested?
I hope I made sense... I'm not a technical person!
Thanks,
Matt
Obviously, things would be so much better if your server didn't convert test to test.html automatically, and since this is a slightly unusal thing to do, there must be a way to stop it.
As regards your question about links, though, well, no-one SHOULD link to you in the way you describe if they have followed your own internal linking structure, since the URL on display at the top of their browser should be syntatically correct with the .html extension too.
As regards the duplicate content penalty, well it depends who you ask as to how severe the problem is. My biggest UK news website has hi-graphics and text-only versions of each story, and both are in the Google index, and the cache to each (being devoid of the CSS) looks identical to me.
And I ended up with a duplicate content page - the home page of one of my sites - when a meta-refresh from elsewhere ended up with Google getting the same content from two apparently different URLs. One page simply got removed from the index and nothing else was affected. When the redirection was removed, the page reappeared. It was quite harmless and well-contained, and actually did what I expected.
OK - that's got you back onto the front page of the forum - now let's hope someone else will puff up my reply a bit!
DerekH
You could verify using the server header checker at: [searchengineworld.com...]