Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Automated content in Google top results - reminds me of AltaVista

         

enigma1

7:18 pm on Apr 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So I am was searching for some computer accessories recently and came across some mysterious domains with blogs pretending to be quite relevant to the search criteria I entered.

So I start reading the blog entries only to realize they make no sense. I am talking over > 10 mil results from the search and the blog I am reading is right at the top. And just to quote a phrase:


To adjust a computer is not same with adjust a watch. Maybe you accept a virus or your computer is active cool slow. You’ve fabricated the accommodation to alarm a computer adjustment expert, but you don’t apperceive area to start. The three questions beneath are advised to advice you feel assured with the computer able you choose. Plus, you will put him or her on apprehension that you accept top expectations for the job.


Looks like a bad translation although I am inclined towards auto-generated content from a dictionary - relevant to a subject. I know there are various scripts that can generate automated content and that's not the issue, but in the past Google was quite good to filter them out (at least from the top of results).

Anybody else has an input on this? I don't think is the particular criteria the problem, it was way to broad and the number of results is what I would expect.

Andem

11:46 pm on Apr 21, 2011 (gmt 0)

10+ Year Member Top Contributors Of The Month



Before this Panda 2.0 mess, I would have thought the same thing as you. I've quickly educated myself, though, over the past week on Blackhat trickery since they are taking so many spots not just from my projects, but from places where I usually find information; It appears the results that you've found are called "Article Spinning".. They take full articles and with a couple of mouse clicks, turn most words or phrases into synonyms.

These so called article spinners have been corroding quality sites for a long time but Panda and Panda 2 seem to have expedited that.

edit: On closer inspection, it looks like that stuff also could have been auto-translated from Chinese.

miozio

12:22 am on Apr 22, 2011 (gmt 0)

10+ Year Member



I've been watching these types before Panda but there were actually quality sites on top and sprinkled around. And now, the real sites disappeared - vanished, like they were picked by hand and purposely leaving AC only... Looks like Google was hacked or something by Chinese.. Remember the issues with Google China?

Pandas live in China you know!

tedster

1:25 am on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In the past it was common to see this kind of automated junk float to the top for a short period but then get totally zapped. Let's hope it happens again. It's not doing anyone any good except the lousy scraping spinning parasites.

Planet13

2:15 am on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



edit: On closer inspection, it looks like that stuff also could have been auto-translated from Chinese.


While the example you gave looks like just a basic spin job, it's not uncommon for black hatters to translate someone's original article from language A into Language B, then into language C, then back into Language A. An example might be:

English -> Italian -> Chinese -> English

But this all seems quite strange, because one would think that the scraper and panda updates would have pretty much eliminated this kind of spam, and many people say there has been a resurgence of it post panda, no?

tedster

2:27 am on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think it often depends on the domain where the spun content appears - particularly it's age and whether it ever ranked well enough in the past to get real scrutiny. This is where I think Google's impulse to freshness sometimes gets in the way of their quality.

If I'm right, that junk page (and domain) should be gone in a week or so. I would assume the owner of the page even plans on it being burned quickly. That is what I hope we see - and soon a stronger algo to fight against "churn and burn" tactics.

walkman

4:41 am on Apr 22, 2011 (gmt 0)



To me this means that the SERPS are up in the air, as always before major updates. Something big is happening with filters on /off, data moving so let's wait and see. There are very different versions of SERPS going around.

Dan01

4:45 am on Apr 22, 2011 (gmt 0)

10+ Year Member



enigma1 - that is the stuff that is trashing up the Inernet. I bet it was auto-generated, but who knows? It figures it ranked after Panda. LOL I was surprised the scrapers didn't outrank it. jk

I agree with ted - hopefully they get zapped back down. Perhaps it was "freshness".

enigma1

10:40 am on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seems there are lots of sources and several methods for artificial content generation that's coming up for many generic searches. It can be translated, automatically created or retrieved from databases and then adjusted differently before public exposure.

As far I can see with databases, there are sites - article directories for instance, where they have tons of "content" presumably of poor quality or scrapped original content, which in turn circulates to various new domains. I guess having some tools in-between they can "fine-tune" the content or perform translations at the top. And seems the end result so far is not quite filtered by Google.

Sometimes is obvious the content is translated, other times you need to carefully read an article posted before realizing some details are just not there which you would expect from someone who put some thought writing it. That is, if you know something about the subject. Otherwise to a newcomer the vocabulary used may look attractive.

I find it much easier to detect the sites from the domain name. With spam sites if the domain name length is short, the chances are, it will consist of random characters otherwise it's way too long with spammy keywords.

In my view, there is a simple way for this problem to be solved and without Google need to do something not even getting involved. And that is the cost for renting the domain name.

Instead of $10 per year for .com, it should cost $100 or more. Or it could stay $10 per year but you would need to pay in advance for 10 or 20 years. Right now the domain value is so low - something critical for a site - and someone doesn't need to do a real investment in the long run and anything goes as we see.

Dan01

11:14 am on Apr 22, 2011 (gmt 0)

10+ Year Member



My wife and I bought a spam machine (program) once. We played with it but never used it. Basically you write one article and it creates dozens, with a few changes. For instance, it might change the word various to several. Or sometimes to often. It had some synonyms, but you could program more. You could also choose how many iterations to make. The more the iterations, the lower the quality of each article.

The next thing, which we didn't buy, was a program to broadcast those copies throughout the Internet.

Looking back, I am glad we never used it. Some have mentioned that Google has de-listed them for "un-natural" links. I don't know if Google can detect those auto-generated spam (trash) articles, but if they haven't done it in the past, they will do it in the future.

deadsea

1:23 pm on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



About five or six years ago Google started its whole duplicate content detection and penalization. I stood up at PubCon in Vegas and asked the Google engineers what they would do about auto-translated and auto-un-translated content. They didn't have a good answer then, but I hope it got them thinking about it.

I would think that it is hard to spot with symantic analysis algorithms. The phrases look plausible to me as far as topic and sentence structure. It usually takes me few sentences, as a human, to determine the content is crap.

If I were Google, I would probably depend on user engagement metrics to detect content like this. If users don't like it, drop it from the search results.

enigma1

1:29 pm on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If users don't like it, drop it from the search results.

This reminds me of the WOT argument. The user experience can be gamed in many ways so user interaction can be manipulated perhaps more easily that content.

On the other hand, why domains are undervalued so much? It's the primary platform for content.

indyank

4:44 pm on Apr 22, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes these look like set up only last month.I also wondered why domains are undervalued so much after Panda?

It probably ranks because it looks clean without any clutter or ads, no typos (does these programs ensure that there are no typos?).

But I am sure it will soon be thrown out.