|Linking to Root: / vs. /index.htm|
There is still much confusion after all these years
| 11:19 am on Dec 13, 2008 (gmt 0)|
In a recent Google thread it was recommended (but not explained or commented upon) that it's best to link to the root using mysite.com/ rather than mysite.com/index.htm After not being on these boards much for the past several years, I find this very old topic has not died, evidently. My idea has always been that the surfer gets where they want to no matter what. Therefore, it is just a SE problem if they somehow can't/refuse to understand/spider-correctly. So I never bothered to make any changes. After searching for relevant posts, the below from Googleguy seems the most relevant, and he indicates it really is not a big issue, and that for consistency and simplicity the link might better be to mysite.com/index.htm. Have things changed? Thanks.
<Googleguy June 2, 2005>
Sometimes a tangent isn't such a bad thing though. For example, partway into the Bourbon discussion, wattsnew asked if there was a technical guide on how to handle www vs. non-www, relative vs. absolute linking, and links to different pages such as / vs. index.html vs. default.asp. My rule of thumb is to pick a root page and be as consistent as possible. I lean toward choosing [yourdomain.com...] but that's just me; [yourdomain.com...] would work as well. Then I recommend that you make things as simple as possible for spiders. I recommend absolute links instead of relative links, because there's less chance for a spider (not just Google, but any spider) to get confused. In the same fashion, I would try to be consistent on your internal linking. Once you've picked a root page and decided on www vs. non-www, make sure that all your links follow the same convention and point to the root page that you picked. Also, I would use a 301 redirect or rewrite so that your root page doesn't appear twice. For example, if you select [yourdomain.com...] as your root page, then if a spider tries to fetch [yourdomain.com...] (without the www), your web server should do a permanent (301) redirect to your root page at [yourdomain.com...]
So the high-order bits to bear in mind are
- make it as easy as possible for search engines and spiders; save calculation by giving absolute instead of relative links.
- be consistent. Make a decision on www vs. non-www and follow the same convention consistently for all the links on your site. Use permanent redirects to keep spiders fetching the correct page.
Those rules of thumb will serve you well no matter what with every search engine, not just with Google. Of course, the vast majority of the time a search engine will handle a situation correctly, but anything that you can do to reduce the chance of a problem is a good idea. If you don't see any problems with your existing site, I wouldn't bother going back and changing or rewriting links. But it's something good to bear in mind when making new sites, for example.
</Googleguy June 2, 2005>
| 3:09 pm on Dec 13, 2008 (gmt 0)|
I've never seen any post that recommends domain.com/index.html in preference to domain.com/, either for incoming links or external links, and I'd be grateful for a link to any such discussion (nothing in your post supports this).
The reason for the advice to link to '/' internally, or domain.com/ in all cases, is quite simple and - so far as I've been aware for years until now - quite uncontroversial.
SEs deal with pages, not sites. Further, they deal in URLs in fact - not even pages. That is why many cms problems occur, as the database churns out multiple urls for identical pages, and why using both www and non-www can (still!) lead to duplicate content issues.
And why using both '/' and '/index.html' can lead to both urls being indexed, with potential duplicate content issues, potential dividing of page rank, etc.
At the very best, it's neutral, but plenty of evidence suggests it's an advantage to eliminate one URL from SE calculations. And as 'default' incoming links will be to domain.com/, it's sensible to emulate that internally with '/'
Of course it's an SE problem, and one they've tried hard to eliminate. But SE problems can bounce back and become yours. And as it's so simple to avoid this one, why take the risk?
| 12:58 am on Dec 14, 2008 (gmt 0)|
Its not an SE problem. Its your problem. SEs try to sort your probelm out for you, with reasonable success.
It's not so much it is better to link to / than /index.htm as it is that index.htm should not exist as a page.
As Quadrille says, if two URLs return the same content, then that content will get linked to under noth URLS, splitting the value that that content should be getting.
For a good discussion, see