Forum Moderators: Robert Charlton & goodroi
Yes, "duplicate content" will kill you. If you have any, get rid of it NOW!
One link for every page! Plus, make sure you don't have other variations on "duplicate content" such as indexed versions of both mysite.com and www.mysite.com.
There is much to read on this forum on the HUGE topic of duplicate content!
page.php?id=4&cat=5 compared to page.php?cat=5&id=4 look like different URLS, but will ultimately lead to the same page content.
Someone else brought up a similar example:
/products/lamps/gardenlamps.html and /products/garden/gardenlamps.html both lead to pages describing the same garden lamp, but have different URLs because they are in different categories.
I believe there have been a number of posts listing how CMS systems, forum software, and blog software can accidentally use different URLs to point to the same page.
[edited by: Beachboy at 5:56 am (utc) on Nov. 17, 2006]
Having two links on one page that both go to the same page can be a matter of navigation convenience for site visitors. For instance, a logo at the top can be linked to the home page, and a text link to the home page can also be provided at the bottom with other links. This is putting your visitor first, so they don't have to scroll back to the top of the page to navigate.
I see lots of sites that do this, and they don't appear to be hurt by it.
Of course, it could be a problem in the future...who knows...
Never forget that Google says design your website for readers - not robots.
My website is thriving.
I can think of a situation where having two URLs point to the same page is proper and legitimate. Let's say there's a page: example.com/foldera/index.html that is linked internally as it should be: example.com/foldera/ .
That page is divided into sections: red widgets, blue widgets, green widgets, and yellow widgets. There are anchor links on that page to each of the different colors (A name="yellow") for the convenience of site visitors so they don't have to scroll through the page. If another page on your site mentions yellow widgets, and links to this page using the anchor for yellow widgets, the URL would be: example.com/foldera/index.html#yellow
So, would that not be two legitimate and proper uses of having two different URLs going to the same page? Is there another way to link to an anchor on a page besides the way in my example?
I believe he was talking about two different URLs leading to the same actual content. For example:
page.php?id=4&cat=5 compared to page.php?cat=5&id=4 look like different URLS, but will ultimately lead to the same page content.
> two different URLs leading to the same actual content.
Reading the earlier posts in this thread may save some embarrassment to those who wish to assume the worst and pile on Google...
Two or more different URLs, including both different domains/subdomains and/or different URL-paths, which lead to the same content is a bad practice, and that is without any doubt what Adam meant.
[added] Oliver beat me to it while I was on the phone. [/added]
Jim
[edited by: jdMorgan at 6:45 pm (utc) on Nov. 17, 2006]
The example of using an internal page anchor (the #yellow) would normally be fine ie:
example.com/#*$!.html#yellow and example.com/#*$!.html are the same url (as near as I can tell Google removes the internal page anchor, we have several pages that use these).
However the example you show of:
example.com/
and
example.com/index.html#yellow
might be (I don't know for certain)
treated as example.com/
and
example.com/index.html
and thus lead to duplicate issues.
Once again we are not building sites for real people, we are building them for Google. And that supposedly goes against everything Google stands for, and it says so in their guidelines.
I have to change internal links and inconvenience my site visitors in order to have my site ranked well in Google, yet Google tells me to build my site for my visitors, not for them. Seems somewhat hypocritical to me.
In the example I used above, the index.html should be dropped entirely, using just the folder and anchor, i.e., /foldera/#yellow and that works.
The thought is that Google drops the anchor, and just follows the URL up to the point the anchor starts.
Given these circumstances, I have a lot of duplicate URLs that I never considered duplicate. I look forward to being able to build pages for my visitors again at some point instead of for Google...but what good does it do if the visitors can't find you? Few will find your page at #250 in the index.
The named anchor "#yellow" is of no concern whatsoever -- Google undoubtedly understands that this is for the use of the browser. A named anchor is not considered to be part of a page URL, but rather a sub-navigation element for use by the browser display function, and only within the page.
The duplicate-content problem arises with URLs like this:
www.example.com/
example.com/
www.example.com/index.html
example.com/index.html
Those are four URLs that (on many if not most sites) will point to the same content, splitting the PR/LinkPop across those four URLs, and thereby diluting it. Ideally, all but one should be 301-Permanently redirected to the one canonical URL for the page, and all links (on-site and off-site) should be to that one canonical URL. As a matter of fact, the redirect should be in place before the site even goes live. This also prevents the search engines from "just picking one" URL --and probably not the one you prefer-- for each page.
That's best practice as it stands today: One page, one URL.
Note that I didn't mention the phrase 'duplicate-content penalty' because until you start actively promoting dozens of alias URLs that all point to the same content, I doubt that there is any penalty involved -- It's simply a self-inflicted wound by dilution of incoming links if only a few alternate URLs are involved. But there's always a lot of FUD involved with this subject, so believe what you will...
Jim
Check your log files. Do bots even request the # part?
I suspect not.
.
I believe that it is only the browser that makes use of the named anchor in knowing where to "jump to" within the page.
The # part has no bearing on how the page is served.
Does a browser even send the # part of the URL to the server?
At any rate, I've made the fix, I found more than a few duplicate URLs because of this. Hopefully Googlebot will know that my error was not intentional, but simply a lack of knowing that it would be an issue.
Thanks again to all for your advice on this.
[edited by: AndyA at 3:50 pm (utc) on Nov. 20, 2006]