| This 176 message thread spans 6 pages: < < 176 ( 1 2 3 4 5  ) || |
|Adam Lasnik on Duplicate Content|
Google's Adam Lasnik has made a clarifying post about duplicate content on the official Google Webmaster blog [googlewebmastercentral.blogspot.com].
He zeroes in on a few specific areas that may be very helpful for those who suspect they have muddied the waters a bit for Google. Two of them caught my eye as being more clearly expressed than I'd ever seen in a Google communication before: boilerplate repetition, and stubs.
|Minimize boilerplate repetition: |
For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.
If you think about this a bit, you may find that it applies to other areas of your site well beyond copyright notices. How about legal disclaimers, taglines, standard size/color/etc information about many products, and so on. I can see how "boilerplate repetition" might easily soften the kind of sharp, distinct relevance signals that you might prefer to show about different URLs.
|Avoid publishing stubs: |
Users don't like seeing "empty" pages, so avoid placeholders where possible. This means not publishing (or at least blocking) pages with zero reviews, no real estate listings, etc., so users (and bots) aren't subjected to a zillion instances of "Below you'll find a superb list of all the great rental opportunities in [insert cityname]..." with no actual listings.
This is the bane of the large dynamic site, especially one that has frequent updates. I know that as a user, I hate it when I click through to find one of these stub pages. Some cases might take a bit more work than others to fix, but a fix usually can be scripted. The extra work will not only help you show good things to Google, it will also make the web a better place altogether.
[edited by: tedster at 9:12 am (utc) on Dec. 19, 2006]
Thank you Adam.
Nice reference to happy fun ball btw. You're dating yourself...and me.
Thanks Adam - good to see you back and Happy New Year.
I mentioned in an early post that we had repeated text drop down navigation tables [ from 2 to 4 ] on each page of our sites, and between pages, in different positions, sometimes in the centre of the page. Is my worry justified that it may act like a boilerplate for duplicate content or should i relax?
"Nor is de.example.com. example.de matters."
I'm afraid this is demonstrably not true.
both rank for pages from France and pages from Italy respectively (the third radio button), even though somedomain.com is hosted from the United States. This deliberate and inaccurate tilting of the results in favor of subdomains over subdirectories continues to one of google's non-logical problems.
Well its good to have some comment so thanks to Adam or that
But ( always is one isn't there ) in our neck of the woods the sites where a " reasonable person looking at it would cry "krikey! it's all basically the same junk on every page!" are doing much better than the sites with real value.
[edited for speling!]
Adam still needs to clarify whether drop down navigation menus constitute boilerplate repetition, as this obviously will affect millions of sites. If it is duplication, then google must inform us of the threshold so navigation menus can be redesigned accordingly.
Adam still needs to clarify ...
google must inform us ...
They don't need to do anything...
|Does Google just ignore Stubs, or is there a penalty? |
Answer from Adam:
|I think, to have 42,000 pages all with language like: "Looking for real estate in Walla Walla Washington [or Wiki Wiki Washington or Wissie Wiggie Washington or 4817482371 other combinations]? We have just the real estate listings below you're looking for!" and no real content below. That is a problem...unhappy ranking adjustments |
Adam, does this mean sites with thousands of stub pages will get an "unhappy ranking adjustment" of the SITE or will Google just IGNORE THE STUB PAGES?
Thanks for all of the previous clairification.
I have .co.uk and .com site with identical content (contact information is different.) However, recently my .co.uk site began ranking higher than my .com site in Google.com. Even when I do a search for "mysite.com", my .co.uk appears before my .com site. This happens for all relevant keywords as well.
I want my .co.uk to continue ranking well in the UK, but not so much in the US. Any tips or ideas?
If this is an issue of duplicate content, why would Google choose the UK site over the US one, when searching from the US.
IMO it really has to do with your US hosting. My .com in English in Germany is seen as German, while the .de hosted at the same ISP is unaffected by the recent changes.
I also have backlinks since 1997 from South America, somehow indicating an interest there.
As it is in the moment it seems you need to double or triple your costs, given expensive UK hosting and move that UK server to a British ISP.
|if you think your site's been penalized, do a thorough check/sweep to make sure your site is now squeaky clean, and then file a reinclusion request. |
This sounds to me like reinclusion requests have become
an automated procedure?
In G webmaster tools, on the opening page there is a "Tools" link, click on it and one of the options is to Submit a reinclusion request.
TC, yes I am aware of that. That is not what I was commenting about. Basically what I was saying is that it now sounds as if just submitting a reinclusion request triggers an automated check / reinclusion as opposed to a manual check.
|RonnieG, you bring up an interesting point about the use of iframed IDX databases. But if everyone's pulling from the same database, that in itself sounds a bit like duplicate content in a broad sense. If that's the only "real" content on a page, why would surfers want to visit that site over the bazillion others that are pulling from the same database, or at least why would we want to include it in our search results with a ton of other sites that offer exactly the same database? That's an honest, not a rhetorical question, by the way. I am open to hearing more about why this content (and other iframed or otherwise included content that is syndicated, essentially) should be valued as "unique and compelling" content for users. |
Adam, I think you missed the real point of my original post, which was that many top placing RE web sites avoid the "stub" penalty, and present the appearance of gobs of original content, by merely calling out, programatically, and creating what appears to be a page of unique listings, from every record in the IDX databases we all use, formatting it a little bit differently, and calling it their own original content, which it is NOT. It is simply a database dump of the same content that is available in the IFramed IDX database lookup. So, why should those sites get credit for the content, when the sites that have IFramed IDX lookups cannot do the same. All I am suggesting is that G discount and send to supplementals, urls/pages that are artificially created IDX listings, and are not true original content. That would even the playing field for all RE sites. PS: It would also eliminate "bazillions" of artificially created indexes of pages of duplicate IDX database listings from thousands of sites that G now holds in its primary indexes, mistakenly thinking they are original content. Feel free to contact me offline for additional discussion and specific references.
[edited by: RonnieG at 10:27 pm (utc) on Jan. 5, 2007]
|TC, yes I am aware of that. That is not what I was commenting about. Basically what I was saying is that it now sounds as if just submitting a reinclusion request triggers an automated check / reinclusion as opposed to a manual check. |
Could be partly, but don't forget that the explanation box indicates that the reinclusion request is read by someone manually. My guess is that there are a combination of checks that can be used at the discretion of the operator on Googles side. IMO
|So, why should those sites get credit for the content, when the sites that have IFramed IDX lookups cannot do the same. |
Maybe because Google isn't perfect and hasn't yet figured out how to plug that loophole?
I haven't weighed in for a while. However, my small site has the dup. / supp'l problems and has for quite a while. More alarming though is the drop in number of pages indexed. Overnight in mid-December from about 3,800 now down to 1,370. And of the remaining only 9 are non-supplemental.
Adam, use the fictional country of Biddleonia!
Not only is it useful, but its fun and will get you bonus points with the (older) Aussies! ;)
(Biddelonia: a mythical country invented by an old Aussie variety show to allow the telling of racist jokes without actualy offending anyone.
How did the Biddleonian terrorist get hurt? Burnt his lips on the bus tail pipe ;)
Ah, I miss Hey, Hey Its Saturday and Jackie MacDonald :))
It's too bad what Adam said about photo pages. Since Google is removing pages it knows are not duplicates, that is very unfortunate, particularly since their priorities are so backwards, with the index overrun by stolen content and googlebot fanatically crawling and indexing pages due to their linking from PR0 blog comment pages.
Poor indexing priorities leads to a poor index. Go figure.
|However, my small site has the dup. / supp'l problems and has for quite a while. More alarming though is the drop in number of pages indexed. Overnight in mid-December from about 3,800 now down to 1,370. And of the remaining only 9 are non-supplemental. |
nickied -Don't be too alarmed [ provided you've fixed the problems ] in the time it takes to be back in the SERP's. It's taken us 5 months for one of our sites to start to kick following fixes [ ie 2 days a go ] - we had similar patterns of observation which caused a lot of anxiety.
In total 7 sites [ 4 already kicking ] - [ 3 to go ]
Have you, and when did you complete the fixes?
[edited by: Whitey at 12:47 am (utc) on Jan. 7, 2007]
If I link from my homepage to an interior page that Google considers to be duplicate content, will this have a negative effect on my homepage?
I know Google doesn't like us linking to bad neighbors (other sites), but what about linking to bad family members (interior pages)?
Sometimes having content on your site, that can also be found on other sites is unavoidable..
|re: wanting specific percentages (re: duplicate content, boilerplate stuff, etc.) |
There aren't any. Again, too many variables.
Would be nice to merge all the factors to a one-dimensional scale and add a second three colored (green-red-yellow) gif to the toolbar. Name it Brin-Rank, Sergey definitely deserves equal honor;)
> Back in my younger days, when t-rexes still roamed the earth...
Nice to hear google employs the "children of the revolution" and puts some emphasis on "not to get fooled". Thx for that paragraph, an eye-opener.
Seriously: Googles scalable approach and its attempts to free the SERPs from MfA-sites are very much appreciated; but there were quite a number of postings in here from people, who have been wiped out though doing no evil, which surely is a pity not only for them, but also for the search-results in general. I'm sure you and your colleagues have carefully followed feedback, visited many many pages in detail, and that the insights gained from there continuously help to finetune parameters of the indexing- and evaluation- process. Please go on, and try to put as many as possible back into the bathwater. As quickly as you can: For some of them it's quite cold out there.
Really I don’t understand Google Guys! or in this case Adam.. I'm tired! I work around 13 hs a day adding new content, using different titles on each page, different meta description, learning and learning, and learning etc etc..
Now two pages with UNIQUE CONTENT with around 300 to 400 words (I have contracted a writer) with a menu go to supplemental. What happen Google?
Many webmasters are having the same problem! I hope is time of Yahoo and MSN.
Google say think in your client! No in Google search engine, so I have added a menu for me clients to nav easily around my site, BUT this is not good for Google
Shortly I will write my theory about Google and Changes…
if the template thing refers to sites that do something like the following, I would be all for removing them.
I hit a deer on New Years Eve after picking my children up from their Grandparents house (they always stay a week after Christmas). Did a lot of damage.
I decided I'd look to see about getting my wife a new tahoe.
I searched for "buy new Chevy Tahoe" or "new chevy tahoes for sale" and similar terms.
What I found was a bunch of sites with the words " click here for more information on buy new chevy tahoe" or " this is the best place for info on new chevy tahoes for sale" something similar.
Some of these sites are well respected sites, or large sites at least. Maybe that's how they got so big.
The thing is, you don't see any of those "turn off" words in the description so you click. When you get there, then you realize it's basically a damn doorway page. I mean, they've perfected the art of creating doorway pages that aren't doorway pages...or whatever.
I had the same experience while looking for a new plasma tv for my wife. It's frustrating.
I am glad you found my "Duplicate Content Meter Threshold" to be a good idea and something for the webmaster team to consider for your roadmap.
Surely the are programatic methods that can be employed to represent a green, yellow and/or red indicator while maintaining a proverbial grey area. A high level, graphical red flag could be used to at least communicate "a problem LIKELY exists" while not disclosing the specifics.
Anyays, we seem to have a penalty imposed as a result of using an archaic CMS to render and manage our pages. We've since had our site audited by 3 independent parties who agree our site conforms to your guidlines. Maybe as a favor for the good idea, sticky me to help this penalty be lifted (if indeed it's one that is not managed by the very "auto red flag system" I have proposed above.
I'm not yet brave enough to turn my stickies on here, but if you make a post on our Webmaster Help Group (linked from www.google.com/webmasters), there's a good chance folks (including likely one of us Googlers) can offer insights and/or take appropriate actions in the background given what you've described.
As always, though, I'd caution you (and others) from so often assuming that significant shifts in GoogleTraffic are the result of a penalty per se. Sometimes our algorithmic changes result in pretty stark ranking changes for some sites, and there's really no scalable way for us to make site-by-site adjustments in this context :¦
Make sense Adam. I'll explore that route. We're not *banned* as we have all pages in the main index (just now maintain a 0, or off the radar rank for nearly all pages) so a re-inclusion request didn't seem to be the right path.
Either way, thanks for all the insight and for providing us all with such a thourough response in order to cool the heat and provide clarification on the subject matter.
| This 176 message thread spans 6 pages: < < 176 ( 1 2 3 4 5  ) |