homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Why Does G Show BOTH Wikipedia and Duplicated Answers.com Content?

 1:42 am on Sep 28, 2007 (gmt 0)

Hey Guys...

I may have missed this one but for a KW search that I do... I get both Wiki and Answers on the first page of the SERPS.

Both pages contain the same info and Answers.com refers to Wiki...

Albeit...Answers has only 30% of the content - but why is this permitted?

Thanks in advance.


[edited by: tedster at 2:22 am (utc) on Sep. 28, 2007]


Robert Charlton

 2:23 am on Sep 28, 2007 (gmt 0)

In the past, I've seen the exact same articles... 100% worth... rank on different sites if there are sufficient high quality independent inbound links. Google seems to have tightened up on this quite a bit over the past year or so, but it may be that it's still happening in your example. Also, 30% isn't the same as 100%.


 2:41 am on Sep 28, 2007 (gmt 0)

Also, check the page titles and snippets, those are the first indicators the G dupe content filter uses to test if pages are substantially the same.


 2:42 am on Sep 28, 2007 (gmt 0)

Ted... thanks for the cleanup...

Honestly... I'm not neccesarily certain that 30% is a hard and fast number... but Wiki at no 3 for a search - I see pictures and text...

I drop down to number 6 and here is Answers with the same exact pictures and content... offering a definition (just in case I missed the DEFINITION at the top of the search, and the same section of Wiki articles...

Is it 100% - no

Is it redundant - yes

I wonder how many webmasters many have had their pages go supplemental because of similar titles...but unique content.

I just think that while plenty of good sites have reported penalties and others "believe" (who knows) that they were manually edited off the first page that those two sites would occupy such valuable real estate space both on page 1 of certain results while G penalizes others.



 8:40 am on Sep 28, 2007 (gmt 0)

The answer is Google isn't that great of a search engine. Listing answers.com is pathetic in basically all cases because all content is copied, and Google isn't bright enough to penalize the ranking of the domain, yet.


 3:41 pm on Sep 28, 2007 (gmt 0)

Answers.com is basically Wikipedia with ads.

This is scraping/spam of the highest order yet they get can somehow away with it. Anyone here trying to start a similar site would get nowhere fast.

I think this is basically a question of money talking.

Do they have some arrangement with Google? They clearly have BIG financial backing - just look at their domain name, wonder how much that cost.


 2:39 am on Sep 29, 2007 (gmt 0)

Answers.com seems like the classic case of a site that needs a hand-administered penalty.


 2:45 am on Sep 29, 2007 (gmt 0)

Not likely while Google uses answers.com to provide their defintion links at the top, right.


 3:44 am on Sep 29, 2007 (gmt 0)

Good point. Google must make a fortune off of them.


 2:27 pm on Sep 29, 2007 (gmt 0)

Which is obviously why they are allowed to break Google's webmaster guidelines.


I wish I had to cash to burn to fly to the States and attend every single Google convention from now onwards just to ask them that question.


 3:05 pm on Sep 29, 2007 (gmt 0)

Conspiracy scenarios can be satisfying, but they're likely to shed more heat than light.

Robert Charlton's post makes a lot of sense. We also need to remember that Google prefers algorithmic solutions to hand editing, since the latter isn't readily scalable and ultimately leads to a situation where there are so many layers of Band-Aids that the wound never heals. It seems to me that, if there's a problem, Google needs to find the sweet spot between recognizing authority status (conferred by quality inbound links) and identifying duplicate content that weakens the SERPs. If Google were to simply apply a manual filter to Answers.com, it would be impossible to fine-tune its algorithms to find that sweet spot, and the underlying problem wouldn't be solved.


 3:27 pm on Sep 29, 2007 (gmt 0)

Well answers.com operates on the same concept as all winners on the net, pay nothing for content. Google, youtube, Myspace.

The money is in presenting the devalued content in different ways. Wikipedia is the worst offender in content devaluation, it actually charges their users by asking for donations. Then they cash in now on the German Telekom which displays the German WP.

They pay and have the illusion they do something worthwhile. Especially idiotic for all these ill advised students that soon will be in a working world where knowledge counts for nothing and daddy doesn't pay anymore and their degree counts for nothing as even 14 year old #*$!le faced Sally from North Carolina can mix in with Astrophysics.

Answers.com just makes money in a world that Google created. It's not only WP that Answers has copied it's some free dictionary and and and.

See it as remixing the web, its been the most successful strategy since Google entered the arena. Why do you think there are so many spammers out there that copy other peoples content? It makes enough money to actually live of it, not endless hours writing content for a ridicolous return.

Its just a massive inflated web version of old style publishers making the money while most authors live on a pittance. With the occasional big winner that is the figure head for x million successless people trying the same.

Why do you think Google is iinvolved in scanning out of copyright books in? Cause they think republishing old pap makes them money. Should you do the same and they don'y have a profit out of you, you are competition. Look at youtube thousands and thousands of replublished telly shows, loads of money .. in copyright theft and republishing on what film and television companies paid millions to produce. Why do you think DVDs and music became cheaper? Why does the publisher in my field that has less traffic than us has web pages that try to explain that it costs them 500.000 Euros to pay for the service and that's why they charge for some their articles. They operate a massive minus operation just to keep face.

Cloning WP is about the most harmless form, its legal.


 3:30 pm on Sep 29, 2007 (gmt 0)

Conspiracy scenarios can be satisfying, but they're likely to shed more heat than light.

Robert Charlton's post makes a lot of sense. We also need to remember that Google prefers algorithmic solutions to hand editing, since the latter isn't readily scalable and ultimately leads to a situation where there are so many layers of Band-Aids that the wound never heals. It seems to me that, if there's a problem, Google needs to find the sweet spot between recognizing authority status (conferred by quality inbound links) and identifying duplicate content that weakens the SERPs. If Google were to simply apply a manual filter to Answers.com, it would be impossible to fine-tune its algorithms to find that sweet spot, and the underlying problem wouldn't be solved.

Sorry that's a lot of rubbish since they hard link to it...

Why don't you try to type in define:widget into your favorite SE and look in the top right where it links to.

Personally I am glad someone makes money from WP.

[edited by: mattg3 at 3:40 pm (utc) on Sep. 29, 2007]


 3:39 pm on Sep 29, 2007 (gmt 0)

To earn, or not to earn - that is the answer.com - another business opportunity. Open Source and free content sounds more and more... exploitable - don't know, perhaps i'm just growing up :) Supply the system, let the others make the contents - lean back, laugh while people look to you as "successful". Oh well, another lesson for the day - thanks.

Robert Charlton

 5:28 pm on Sep 29, 2007 (gmt 0)

Sorry that's a lot of rubbish since they hard link to it...

I've been talking about the organic rankings, which was what the original poster was asking about.

Why don't you try to type in define:widget into your favorite SE and look in the top right where it links to.

Actually, that's not the way the Answers.com links appear. The Google define: operator displays a whole page of Google Definitions, which are definitions obtained from spidered results. They've been discussed here numerous times.

The hard links to Answers.com definitions appear in the upper right of the blue Web bar, along the approximate number of pages and the time of retrieval. They're a fairly recent addition, clearly part of Google's ongoing experimentation with its serps displays. The blue bar has also included News, Images, Products, etc when applicable. To get the "definition" link for widgets, you merely need to do a search for that term.

I'm assuming that Google is licensing the "definition" content from Answers.com. Not sure how that plays out if Answers.com is using Wikipedia content, but that's another topic.

The dictionary definitions that Answers.com returns are clearly marked with copyrights, and I assume that Answers.com is paying for them, or is exchanging links for them, or has a legal staff working on the problem.

It's not unusual these days to see content being pulled from many sources. Google Local, eg, is pulling reviews from sites that I know are subscription sites, and Google is undoubtedly paying for this material. Often, those reviews were contributed by users of the review site for free. This is known as user-generated content... and if people here have big problems with it, they should probably stop posting on forums and review sites in general.

In spite of the Google hard link, based on searches I've tried, I don't see any apparent organic favoritism being given to Answers.com. In fact, when I've search for, say, a sentence in quotes from an Answers.com definition, the original source or sources are there, but the Answers.com page generally doesn't appear unless you click "show more results" or add "&filter=0" to the Google search url. Hardly seems like favoritism or quid pro quo to me.

Duplication is tricky in Google, though, and it's very much query dependent, so I can't say these definitions won't rank on other searches.

What surprises me a bit is why Google isn't returning its own define: operator results rather than the Answers.com results when you click the "definition" link. Might be a concession to the credibility of a hand edited dictionary, or it might be that Google is running various tests to measure user satisfaction.

[edited by: Robert_Charlton at 5:36 pm (utc) on Sep. 29, 2007]


 5:47 pm on Sep 29, 2007 (gmt 0)

First of all, EFV, in my usual fashion I disagree with you on the need for algorithmic solutions. What you're saying is true for the bottom of the web pyramid. However, sites in the top tier can and probably should be manually adjusted. By the way, I think Google realizes that and seems to be doing quite a bit of manual work - I see those corp.google.com referrers in my log regularly and my understanding is that these are manual reviews in progress. And I am nowhere near the traffic levels of the likes of answers.com.

Second of all, on the issue of content scrapers (or should I say "automated content scrapers", for we all rehash other people's knowledge in some ways), do you think it presents a strategic opportunity for engines like ask.com who could simply design their algorithms to avoid those sites? If people really cared, they'd prefer clean results from the source, no? Or could it be that only WE care as we actually create original content and then watch those guys come out of every corner scraping, indexing, re-mixing and reassembling our stuff... and making all the money?


 6:18 pm on Sep 29, 2007 (gmt 0)

Loudspeaker, I have no personal bias for or against algorithmic solutions. Whatever works is fine with me. I'm simply saying that Google clearly sees a need for algorithmic solutions. (Indeed, Google employees have stated in the past that spam results are sometimes left in the SERPs to provide a means of checking the performance of automated filters.)

In light of Google's bias toward algorithms and "scalability," it's hardly surprising that duplicate content from Answers.com, stub pages from Wikipedia, "contribute a review" pages from keyword-driven, computer-generated megasites, etc. would be left for computers to deal with instead of being searched for and destroyed manually by a brigade of whack-a-mole troops. With luck, Google will eventually do a better job of distinguishing content pages from junk pages on domains that (presumably because of inbound links, age, and overall usefulness) have earned more trust or authority status than they deserve.


 6:47 pm on Sep 29, 2007 (gmt 0)

As it happens, I think that answers.com is a well-presented compilation of data from many sources, and thus adds real value.

There is often the complaint that Wikipedia content may be made up and/or wrong and there is no easy way to verify it (or that users should verify it but don't). Well, when that Wikipedia content is presented alongside info on the same topic from branded and trusted providers it is a very handy way to cross-check and verify and get the best value from each.

I'm glad that Google has that 'definition' link: I use it a lot. It significantly boosts my confidence in all the sources presented.




 8:44 pm on Sep 29, 2007 (gmt 0)

Stapling together three copies of somebody else's work doesn't create value, as any high school teacher will tell you.

There is no excuse for answers.com results. They are duplicate content by definition, and similar to the worst spam on the internet (content thieves) and unsimiliar to the best results to serve up (unique, quality content).


 11:16 pm on Sep 29, 2007 (gmt 0)

Perhaps we're all just talking from our personal interests? - let's take a step back - or i'll try anyways and talk about the business models and my experience of it in the past - you can do whatever you want:

User Generated Content - is worth approxiomatly a dime a million - but if you have enough - it's worth money - i've yet to come across something remotely interesting on those "free services" - which eventually put advertising there and call it "maintenance and costs" - i'm sorry to say it, since in my young years on the internet, I was basically as optimistic and idealistic as the people using those services were, right now "People can like - see me here - and I can like communicate with people and you know - show how obsessed I am with myself and what I do" Looking back - i'm happy none of it survived - since it was just as bad as what I read and see today on those services which seems to pop up here and there and everywhere.

But this will go on - it's called trafficking - it's nothing new - just got another and new and trendy and smart name for it - eventually people using it just leave it, forget about it and new people come there to start the road which they've just ended. The content seems to be scaleable - so does the number of profiles - but it's static when it comes down to it - people don't delete their profiles when they go away - they just leave it there, so it remains a number in the system, which can be handed over to potential buyers or investors that we have "this many profiles! and this much content! and so on!" - of course they have, it's just not active profiles, and usually none of them resembles anything worth remembering or thinking twice about - but for some strange reason - it's sold on a value which doesn't exist.

With regards to Wikipedia then it has its problems - especially with the rips of other peoples work - but then again - it also has its benefits. Before Wikipedia - then there were those "college essay exchanges" with those strange ToS which nobody ever reads - where people would think it was a free service and an exchange of information for college students...

Suddenly those exchanges would vanish with the content and then a new product came along which evolved around "Buy College Essays - Subscription Fees" and funny enough - they had the copyrights on the 100.000+ original essays on all kinds of topics all of the sudden - it was in the ToS of the previous "Exchange" (which sometimes just remained there to gather more content) but the ToS of those sites were clear that with any information submitted so was the copyright and every other right there ever was to the content - MySpace also did this in their ToS - before they were bought up by that nice company. They just changed it very fast, silently and without any noise..

At least with Wikipedia, as a more or less not-for-profit service but more-for-popularity (especially the founder - he's a success!) - has the benefit that people can contribute and begin where other people left it. Not that many people actually do that - as far as I could dig down then there's around 1000-1500 people doing actual work there, the rest are just hit'n run edits or lurkers and users, nothing else. In my eyes - it's a better way to do it - than the monetary exploit and hit'n run turn it around at some point business which is usually about. Perhaps they'll create something with it - and make it a more centralized place for some information for their age group (yes, 15-25 is actually the market - they're young, idealistic, and don't think too much about what they're doing or care much about it anyways) - which would also make it a little easier for the teachers and so on to just check if it might came from there.

With regards to MySpace, it's truly unique and valueable content there - a collection of diaries, and .. i'm just like wow - because the MySpace concept with a profile - and you know - you can put stuff there was the first thing I came across when I first got online back in 1998 and it was a hit back then among the young and smart who somehow managed to get online on expensive 56k modems which made an unbelievable noise - even after the dot.com crash - They just also had alot of forums too where you could like participate and some other fun services...like you remember, music which played automatically when you got to the personal profile... (midi's anybody? remember the plague?)

Why the people who bought MySpace up could pay what they did for it - is beyond my imagination. But I guess it falls under the category "Art" - since that's the only serious business i've ever located where you can sell compilations of trash and really bad content for millions. I will have to add the "Internet" to that short list which I've maintained as lucrative business opportunities if you're crazy enough, I guess.

Answers.com is just scrambling different sources together and it's pretty much automated and then they put advertising around the contents and... it's like google - just in a more expand the snippet of the contents, add some other resources, reduce the numbers of results - way. Don't fool yourself .. it's the same business model - nothing new and nothing else.

The lesson is: If you create something unique and you know - something with your brain and it's not robot generated or user generated ... then you are left out and you will get a dime every time somebody makes a million - on your content - since a computer/bot can read it, scan it, compare it, rank it, present it, decompile it, restructure it, reorganize it, and then republish it, even in snippets or bits and parts on a million other websites - and put solid advertising, and revenue models along side it - before you've gotten to the "signup" screen of the first advertiser who is going to rip you off - and all this happens even faster than you can think of what to do next after those few years of work on that original content - but it's nothing personal, it's just business.


 6:39 am on Sep 30, 2007 (gmt 0)

I agree with the paragraph above - take a look at what Google is doing with local restaurant reviews (admittedly without blatant monetization, but give it some time!). Type in a name of a restaurant in a major metro area (on maps.google.com), then click "more info" next to the restaurant name and see all those snippets, photos, quotes and reviews pulled from different sites.

Honestly, I feel sorry for all those suckers who actually went there, ate, wrote reviews, took photos. In short, worked. The result of their work is all there for the taking (as snippets and small images, sure) and I don't even see that much of a need to click onto the sites to see complete reviews. Snippets are probably going to work for 75% of people.

I am not blaming Google for taking advantage of this (if they didn't do it, somebody else would), but sometimes I feel the content authors should rebel against this practice in general. No, robots.txt is not the way to rebel (you'll only hurt yourself by losing traffic). Something else must be invented, but I don't know what.


 8:27 am on Sep 30, 2007 (gmt 0)

Maybe when all the free web content is dumbed-down, re-written, snippet(ed), aggregated, compiled, wiki-ed, web 2.0'ed... then we can get back to the (real) bookstores and newspapers.

just kidding of course... :)


 9:31 am on Sep 30, 2007 (gmt 0)

Type in a name of a restaurant in a major metro area (on maps.google.com), then click "more info" next to the restaurant name and see all those snippets, photos, quotes and reviews pulled from different sites.

When I do that, I see a series of annotated links that drive traffic to the featured search results. I'd imagine that most publishers of reviews, etc. are happy to get the additional traffic, and that they'd ban Googlebot with robots.txt if they weren't. (When was the last time you saw a post here by a Webmaster who was angry because Google insisted on providing search referrals?)


 4:09 pm on Sep 30, 2007 (gmt 0)

Again - it's not about how good or bad any of the above is - they DO provide something of value for some people at some point - no matter how they do it. That goes for google, answers, wikipedia, everybody - it's just different business models used on a large scale - it's the "big" examples - most of the smaller sites and such are also using these methods - sometimes variations over them and such - but it basically remains the same.

But to refer to the orginal post again - and give my opinion on why duplicated content and such shows up in Google: They earn money on it - and so does the people with the duplicated content. Google is nobodys friend - they're a business, not an ideological movement like they sometimes pretend to be - and they will pretend to give some other businesses (people with alot of resources) something of value (visitors) so they don't leave the scene. Google depend on it's own success and access to information and you don't want to have too many people being angry at you, and especially not in the marketing sector - you want them to depend on you - because if they do - they will try to protect you. You've just turned a potential enemy into another soldier in your army.

Again, don't ever make the mistake of thinking that the internet or a business is democratic - it's not - it's a mix of feudalism, oligarchy and meritocracy - which is put into a dynamic trio: Royalty/Aristocracy/Farmers - and everybody starts as a farmer..

Oh, and by the way - last night I blocked all robots from my website, and all referrals unless they were approved by me as "trusted sources" - don't need garbage visitors from serps - so somebody does it -

[edited by: RandomDot at 4:25 pm (utc) on Sep. 30, 2007]


 4:55 pm on Sep 30, 2007 (gmt 0)

I'd imagine that most publishers of reviews, etc. are happy to get the additional traffic, and that they'd ban Googlebot with robots.txt if they weren't.

What if they are happy to receive Google results from Google the-search-engine but not too happy to see their snippets and photos in Google the-restaurant-review-aggregator? Is there any way for them to allow one and disallow the other? (The way it stands, I don't think so) Do you think there should be a way?

Again, I don't necessarily think Google is the worst offender here - as you pointed out, at least they give you links. But don't you think there's a substitution of the model going on here? The old model worked fine with robots.txt. The new one seems to need another config file. Simply referring content authors back to robots.txt is essentially trying to push a "packaged deal" on them (either no remixing, but also no search engine traffic or you get search engine traffic, but your content may be used any way the search engine likes).


 5:25 pm on Sep 30, 2007 (gmt 0)

The basic model with robots.txt and .htaccess to control access and use of the different content in various ways on a website has a flaw - it's either/or - there is not an if/and - especially not against robots.

With regards to the reviews of restaurants - google has a conflict of interests - since they're building the same business - just on a global scale - what the other website is using on a local basis - they're building on a global basis - google is nice to provide links back - because .. it's their pipeline so their system keeps updated based on the other persons content... again, who is going to earn most on it -(either in service, revenue or anything else) - guess a few times.

Keep in mind that google is smart enough to give that link back and give them something back .... don't know anybody else who is doing it, so they're the best option currently available - wouldn't want to cut the pipeline before something better shows up - but then again - google might just be a little roman who knows that infrastructure and resources is the key of any empire - and that you in generel should make sure that your population is happy, entertained and well fed.

Robert Charlton

 6:07 pm on Sep 30, 2007 (gmt 0)

...reviews of restaurants...

My point about reviews of restaurants is that Google appears to be paying for them, as they're material that's behind a log-in (we're not just talking about robots.txt), if they're even that accessible. The Zagat material may be from user reviews that never get published in their entirety... I'm not sure, as I'm not currently a subscriber. But user-generated content may be subject to other uses.

Similarly, Google may well be paying Answers.com for the use of its content, and Answers.com may be paying those dictionaries for their content.

I think this is completely separate from the algo. Whatever the usage considerations, the Google Local content shouldn't create dupe problems for anyone, and I'm guessing that Answers.com pages out in the wild are subject to the vicissitudes of Google's normal dupe filtering, just like any other pages.

Those "definition" links are clearly subject to a lot of redirects and aren't likely to be transmitting PageRank (which would be a problem for Answers.com if they were, since they go to a page with a different tracking string than the corresponding Answers.com page on the site).


 9:42 pm on Sep 30, 2007 (gmt 0)

"they DO provide something of value for some people at some point"

And again, they plainly do not.

To put it most simply, if you were to make an exact copy of the answers.com page and put it on another site, this adds nothing of value to web searchers. If Google ranked the answers.com page at #7 for a search, and your copy of that page at #9, this would add NOTHING of value to the SERPs, or the knowledge of the universe for that matter.

Now suppose the top 10 for a terms were ALL copies of the same content on the answers.com page. Saying they provide value is silly. What about if every result in the top 1000 had the exact same copied content as the answers.com page?

Copying or stealing content adds no value, and has no value in being shown in the results. The original sources should be listed, but any copies are strictly ANTI-value.

The answers.com results arent just clutter, they degrade the results, as would any additional copies of the same content.


 2:52 pm on Oct 1, 2007 (gmt 0)

That really is a very costive and narrow view, and is not true for me. Repeating your statement does not alter its truth.

The careful and accurate collation of results from separate sources *is* a valuable service in this case, just as an SE makes the Web content-accessible. Different modes of access of the same data are valuable for different things, else, why would the following be valuable in the 'real world'?

1) Meta-studies combining existing research results?

2) Non-direct-mapped cache memory.

Your "1000 copies of the same page" straw man is *not* what answers.com is doing.



[edited by: DamonHD at 2:53 pm (utc) on Oct. 1, 2007]

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved