Forum Moderators: Robert Charlton & goodroi
This makes sense to me. RSS syndication should make this happen automatically because the syndicated content always has a link back to the site as part of the RSS specification.
I have an issue with cobranded sites. The site is a complete copy of the original but with branding for another site as a "section" of that other site. Currently we block googlebot from the cobranded site with robots.txt even though there is a small amount of original content on the site (with a TON of duplicated pages). Our contract prevents us from putting links back to the original that users can follow. I'm wondering if links in the head like: <link rel=original href="http://example.com/original.html"> might help the search engines resolve the duplicate content so that we could unblock the sites from robots.txt and allow googlebot to find the small amount of original content.
A standard approach you might experiment with could be using rel="alternate":
<link title="Original source for this document" rel="alternate" href="http://www.example.com/page.html">
Still, even using your non-standard attribute might be just enough. I'm pretty sure that Google would see the url, even if the relationship would notbe 100% clear.
[edited by: tedster at 3:02 pm (utc) on Nov. 21, 2006]
I wrote several articles on my site years ago that were related to just one section of my site. Another site, the authority in its field, requested to copy the articles I wrote on their site, and link back to my site identifying me as the author. The original articles had been online on my site for months, and were listed in Google.
Then, when my site dropped in November 2004, all of a sudden the pages on the authority site were credited with those articles instead of my site. And they rank pretty high, too. My site is nowhere to be found.
This is an insult, I am not credited by Google in any way for my original copyrighted work.
If Google had it together on this, they'd know my site is the original since the articles were online on my site for months and months prior to appearing on the authority site, and that site does link to my site on the every page where there's an article. Because that site is an authority, I do link to them as well, but only on my links page.
However, for scrapers borrowing content without permission, and none of the sites linking to each other, how is Google going to decide which one is the original?
And here is a quandry... site A publishes an article. Site B scrapes it, and gets sites C, D, and E to also publish it, and C, D,and E also link back to site B. According to this, site B is the authority. Site A just looks like some random site that didn't bother linking back to the original.
How would they react even if sites X, Y, and Z linked back to A, in the face of sites D, E, and F linking back to B? Do you think that they would still realise that A is the "real" site? Here we have two sites, both claiming to be the authority and both having the incoming links to "prove" it?
[edited by: g1smd at 8:45 pm (utc) on Nov. 21, 2006]
And once done, apparently, it's done. The other site has had those articles listed exclusively in Google for the past 2 years. I did the work, I published them first, and the other site gets the benefit.
Any way you slice it, it's just wrong.
What do you suppose we're doing TODAY that will get us in trouble in 4 or 5 years? Could it be all those reciprocal links will get you penalized in the future? Who knows what Google will pull out of their bag of tricks in the coming years.
The only thing any of us can depend on is change. And something that is perfectly fine today will no doubt be a problem in the future, based on past history with Google.
Google does not have a crystal ball either. Even 3 or 5 years ago, the way to avoid this was to get links directly to your copy of the work. What if the original was on paper, you published it on the web first, then the copyright holder published it at a later date?
It seems obvious to me that if you start licensing your works, you will risk having the search engine consider a different copy of your work to be more important.
Google's goal, first and foremost, is to make sure their SERPs are not loaded up with duplicates. Most searchers (Google's search customers) don't care whether they get the copy from the original source.
Of course Google would like to have the most authoritative version, which is often the original, but they don't have that crystal ball either. If you want your original version to be the most authoritative, don't licence it to sites that kick your butt, and do thing which will show authority to the search engines, like getting links directly to your articles.
Don't expect Google to just assume that you are the best source, simply because you consider yourself to be the best source.
What do you suppose we're doing TODAY that will get us in trouble in 4 or 5 years
Googles goal is to find the best results for it's customers.
The way to accomplish that will always change, but the goal will remain the same.
=> Content and Semantics are the Kings
None of Brett's 26 Steps [webmasterworld.com...] will ever get us in trouble
When a person searches for Blue Widgets, they might like a bit more info than a picture and a price. And they can find pictures and prices on my site as well, along with a lot of other information at spot #155 on Google.
Yes indeed, Google certainly is maintaining the best search results... (NOT)