Welcome to WebmasterWorld Guest from

Computational Linguistics and quality content

8:33 am on Dec 2, 2010 (gmt 0)

My site
I run a legitimate 2year old site and earned some trust and authority within my niche. My site is a content aggregator and I get most of my content via direct input from some bigger content creators currently.

My site offers additional value by optimising the content and presenting it in a unique way.

A section of my site is aggregating news:
I started to post and archive feeds on my site, currently a few thousand articles.

My idea
Now I want to get my site to the next level:
I have the opportunity to get 200k news articles from a topic related news archive with round about 200 words each and without the need to link to the archive.

My plan is to create a new page for each news item and put some links to my existing content in them (wiki style).

I also want to optimise the content by replacing words with synonyms, deleting useless words, deleting useless sentences like "please come to our event" from this old news. To do this I will have to hire a Computational Linguist.

I want to build an "encyclopaedia" by identifying relevant words and word combinations. Quotes from my optimised articles and relevant links to my existing content and the news articles will be the only content.

My question
Only one of my competitors is publishing all the 200k articles. He is ranking for 200k+ keywords (searchmetrics) and more than 10 times more backlinks. My competitors site is 10+ years old and has at least 10% server errors (5xx) because of performance problems for years(!).
1) Is it possible to outrank such a site with very similar content (original news + links)?
2) If the answer to 1)is yes: Would you advise to show both versions of the article (on the same page/on different pages?) or just one version (which one)?

3) What are your experiences with using Computational Linguist(ic)s to change tons of high quality content Google already knows and Googles reaction?

4)Which mode of publishing should I use (1 post per x minutes or bigger packages / starting with the oldest or the newest news)?
12:26 pm on Dec 2, 2010 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

It appears you are basically asking if it is a good idea to republish a large amount of duplicate content. This is generally a bad idea.

I don't mean to be harsh but I see many flaws in this idea.

a) 200 words is not significant content (especially when its duplicate content)
b) computational linguistics tools often arent good enough
c) google has been dealing with this trick for over 5 years
d) managing 200k pages is very different from managing 1k pages

If you think this is a good idea then do a test run of 500 articles. This will help you identify how much unique content per page that google likes. How the internal navigation should be setup. Plus it will help you determine just how good your computational linguist program is (quality can vary widely).

I played around with this several years ago. I don't do it anymore because imho its more profitable to hire writers or develop UGC (user generated content).
2:07 pm on Dec 2, 2010 (gmt 0)

Thanks for your advise.

a)Pages with more than 200 words are rare on my site, but it works. I provide unique content without using large amounts of handwritten text. On most page I use a sentence like "this are the results for (keyword):" and show the data in a list. The unique value is in the presentation and the logical interconnetion of my data.

b)Maybe they are not good enough to make Google believe, that I manually wrote the content. My intention is to improve the quality and the SEO metrics/semantics of the text.

c)Why must it be some kind of trick to post a large amount of duplicate content? My biggest competitor is doing this for years and is ranking better than the source. It is an old and slow site with a huge error rate because they are not able to handle their success. It would be a better user experience to read the same news on my site because it doesn't crash all the time.

d)My plan is to get to 600k-800k pages in the next 2 years with round about 250k-300k pages generated with this duplicate content in different ways. This is the size I need to get equal with my competitors. Actually I manage more than 120k pages, more than 50k indexed.

Why shouldn't I release duplicate content if I can offer a significant amount of additional value by offering links to related articles and related content and maybe rewrite the content?
3:52 pm on Dec 2, 2010 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month


If you honestly feel that you can add enough value to the 200k pages of duplicate content that you will then pass google's current and future algorithm, then go for it.

Please reread my comments. I did not say you should not do it. I simply suggested running a small test version to prove your concept. If you are confident in your idea and ability then go forth and good luck.
8:11 pm on Dec 2, 2010 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

What this sounds like is commonly called "article spinning". I appreciate that you want to do this manually rather than using software, but it's still not really a value add.

I also think that the Google spotlight is on this practice, as it is on all approaches for inflating content only to rank better. That said, it is a technical trick that works for some people, for the moment. I personally stay away from it.
11:01 pm on Dec 2, 2010 (gmt 0)

Thanks for your replies, goodroi and tedster.

"In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved."
"When in doubt, I'd do what makes most sense from a user perspective (what is most helpful to human visitors). You could also block those pages using robots.txt if you're worried about how search engines will view them."

I can't see a negative effect for the searchers, no one is "deceived". It should be easy for google to filter my articles if they don't want to show the articles on my site and prefer to show them on a site which can not ensure you to get the requested content because of an insane amount of server errors.

"with intent to manipulate our rankings" Google
"inflating content only to rank better" tedster

Although I know that there is no definite answer: Google doesn't know my intention. Is a well rewritten article with relevant links to my unique content more dangerous for my site than an unedited version is? On one hand I may seem to have some sort of manipulative intention, on the other hand I signalise that I care about my content...

My current opinion: the most important factor is the amount of added value and the quality of the rewriting. Stupid keywordstuffing is not the solution. I will start with a test sample and gradually increase the amount of content. I suppose me and goodroi don't think that releasing 200k at once is a good idea.
1:19 am on Dec 3, 2010 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

I suppose me and goodroi don't think that releasing 200k at once is a good idea.

LOL - make that three of us.

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month