Forum Moderators: phranque
Here's the article (subscription required): [online.wsj.com...]
Nothing that's new, but it's intriguing that it's being covered in mainstream newspapers.
[post-gazette.com...]
Thus, a kind of schizophrenia exists at search-engine companies. Half their engineering staff is busy trying to keep useless pages out of search results; the other half is busy coming up with tools that make it easier for people to create and profit from the useless pages in the first place.-- From the article.
He goes on to hold out hope that developments like TrustRank will help.
Infospace, by the way, owns dogpile.com and other lesser known meta search engines.
I like the author's comparison of the web today as the outlaw anything goes period.
Google has no one else to blame but themselves.
the article was nothing new ..same old story ..then they cover the page in advertising how is that any different then most web pages.
Not nearly as bad when I was at MSN (front page)the other day following an article to the "read more" page .the "read more" page was 90% advertisements .. there was lterally less then 20 words about the topic .. then you had to click "read more" (again)to get to page 3 to finish the article and be bombarded with half the page in advertisements.
Listen I clearly see the pages that are the true culprits ..but these major websites are no different ..they simply make the garbage look a bit more professional
Characterizing WSJ as a scraper because it runs some ads is an inappropiate comparison. The article itself is important because it draws more attention to the Catch 22 Google has put itself in.
Sure, some of them are ad whores, see Yahoo, but I suspect they lose more then they gain. Time and market forces will tell if their flea market look will prevail.
True, some of the news services and mainstream portals make you run through 5 pages to read their entire article, etc., but at least they have something to offer.
Scrapers have NOTHING to offer.
Potemkin Village websites have NOTHING to offer.
Although one could place the blame squarely on lazy, greedy uninsightful webmasters, Google is also to blame.
the article was nothing new ..same old story ..then they cover the page in advertising how is that any different then most web pages.
What's "new" is that this sort of information is getting published in high profile media. And its the big companies with huge marketing budgets that read this sort of stuff, and they tend to believe when they read them in WSJ rather than WebmasterWorld.
A solution to this frenzy is to allow advertisers to track conversions on per-referring-site basis and switch (automatically) those off that fall below acceptable for them limit. This way non-converting scraper sites won't earn a dime and have to switch to something else.
Adam Smith's Invisible Hand is just as useful in web search as in economics. If undue regulation is removed then sites about widgets which are titled "widgets" and mention widgets on the page will rise to the top because it is more profitable for someone producing real widget information to work hard to rank for widgets than anyone else.
G's current so-called spam filtering elimates the most relevant pages, leaving only the wastelend of "vaguely optimized for a wide variety of topics" type sites in the top SERPs.
G has outsmarted itself and due to it's misapplied human intervention and spam filtering has created a less relevant product.
Trust Rank is no solution as it sounds like G's "democratic" method of Page Rank with the even playing field removed. Trust Rank will be an excuse to kill off scrapers and most small publishers as well.
First, scrapper sites are growing... and growing fast. With a host account and articlebot you can create a 10,000 page scrap site in about a few days. Then it is a simple matter of rinse and repeat.
I think the only way a solution will work is with human intervention in the process. Either TrustRank or Lord Magestic's suggetion or both.
It look the WSJ just a few lines to realize a site was a scraper site but even the latest version of googlebot doesn't know the difference between that an high value content.
The obvious solution is for G to stop trying to filter sites. In 2001, when I could walk up to to the top of the SERPs for virtually any term I chose was also the time in which G delivered the best results.
Wow thats a dumb statement, your saying that when you were the only one packing keywords google provided the best results? So therefore YOUR results were the best in every keyword? Gimmie a break! If google allowed had the same system now as in 2001 you would be screaming bloody murder because some Indonesian would be blackhatting better than you could. It would be one giant battle of blackhat SEO... I'm sure that would produce some interesting results.
Google delievered the best results in 2001 because there was far fewer people playing the system, this was the reason that google's system worked because google's system relies on honest people. If people link to me with the word widget, and I use the word widget a lot, I will probably be about widgets. But once the masses started catching on then everyone was putting "widget" all over the place and the system is breaking down.
I'll admit I'm dumb if you'll admit you're ugly.
There is no great difference between the number of scammers and spammers in 2001 and now. Did you even have an Internet connection in 2001?
Here's a thought, stop taking your jackass pills for a couple of days and reconsider this:
It is more profitable for a producer of real information about widgets to work hard to rank for the term widgets than it is for a spammer.
Think real hard about that.
If I sell widgets - actual on topic widgets - I have got to make more money from visitors looking for widgets than a scam site which doesn't really have any widget info. Therefore, as a widget seller, I am much more highly motivated to compete with other real widget makers for the term 'widgets' than a spammer.
Only when G penalizes sites for being 'too relevant' does this process become impaired.
I would like to see google take a strong public stand on such sites - and not look for an automated solution to an automated problem. But alas...
A solution to this frenzy is to allow advertisers to track conversions on per-referring-site* basis and switch (automatically) those off that fall below acceptable for them limit. This way non-converting scraper sites won't earn a dime and have to switch to something else.
- scraper sites probably convert better than some quality sites, which lines both the publisher, googles and the advertisers pocket - the losers are the searching public not looking to buy, and the publishers of proper sites..
* as a publisher & advertiser I would also like to see this added, but not as a smartpricing feature - but just to show which sites are display my ad's - and give me a heads up to click fraud. I know that I could scour my logs for this information - but it's already data google has - please display it.
I'm dumb. But despite my vocal impairment, insights flow uninterrupted from my fingertips through my keyboard only to be wasted on the CRTs of fools.
More and more people (I'm sure helped by sites such as WebmasterWorld) are learning about making money online and for many of these people scraper sites appear to be the easiest way to get that money.
Why write your own content when you can scrap (either through copy and paste or a content rewriting script) someone elses content.
I don't know if you've ever bothered to check out how bad it is, but some of my sites have literally been scraped into oblivion. There are hundreds of sites that have captured parts of my content -- and most of them are ahead of me in the rankings.
The major failure of Google is that a person who really has useful content is forced to play a SEO ranking game to stay listed. This is wrong. Just building good content will not get you a high ranking. The scraper sites just come along and take your content and use it against you.
It's so disappointing to search in Google only to find endless scraper sites, as well as sites in Chinese that are ranked higher than you only because the search term is the only English text on their page.
Clearly, there are people here who are already familiar with programs like "articlebot", although I never heard of it until today.
The major failure of Google is that a person who really has useful content is forced to play a SEO ranking game to stay listed. This is wrong. Just building good content will not get you a high ranking. The scraper sites just come along and take your content and use it against you.
I completely agree.
Although I hate hate hate to see it, I believe in 5 years most of the quality content on the net will be hid behind micropayments (but will still be searchable by google). I can see of no other way for publishers not to be screwed over every time they put lots of research into a topic.
It is far too easily to scrap and as people in the third world join the net some find no downside to scraping, since nobody can touch them.
It is sad because when I write content, I know dozens of people copy it and put it on their sites either through straight copy and paste or through something like articlebot.
It is not just professional scrapers, but teenagers looking for a quick buck.
The theory that alot more spammers have gotten websites since 2001, resulting in worsening SERPs is incorrect. The number of responsible publishers coming online has also increased since 2001, which should be a statistical 'wash' -- leaving the SERPs as useful as they were in 2001.
I don't buy zulufox's theory that WebmasterWorld has helped to create an increase in the percentage of spammers vs the number of responsible publishers. If anything, WebmasterWorld readers are encouraged to avoid spam techniques because although they may provide short term gains, they just as surely lead to long term losses.
The SERPs are full of crap because G doesn't work as well as it did in the past. And if they don't fix the problem, the Invisible Hand will simply place some other SE on top.
A solution to this frenzy is to allow advertisers to track conversions on per-referring-site basis and switch (automatically) those off that fall below acceptable for them limit. This way non-converting scraper sites won't earn a dime and have to switch to something else.
So scraper sites that convert are ok?
Rather than focus on conversions, adsense applications should be required for every site not just the first one belonging to a publisher.
It is a combination of G's mishandled attempt to remove spam by penalizing pages that are 'too relevant,' plus the abilty of scapers to post thousands of pages that are opimized for 'everything at once but really nothing at all' (allowing them to escape the 'too relevant' filter on a variety of search terms).
Scrapers can and will be stopped when G stops penalizing sites for being too relevant or when a new SE comes along with a good algo and technicians smart enough to know when to stop tweaking the dials.
[edited by: Atticus at 6:52 pm (utc) on May 3, 2005]
I don't buy zulufox's theory that WebmasterWorld has helped to create an increase in the percentage of spammers vs the number of responsible publishers. If anything, WebmasterWorld readers are encouraged to avoid spam techniques because although they may provide short term gains, they just as surely lead to long term losses.
This very thread is a perfect example. Everyone on this thread is complaining that scrapper sites are taking over google and making money. Furthermore, we have discussed the methods (articlebot and offshore hosting) to make a "good" scrapping site.
If some amoral personality visits this thread and reads it, they have all the information they need for a scrapper site.
If you never taught anyone how to drive, there was be no drunk drivers. As you teach more people how to drive, there are more drunk drivers.
[edited by: zulufox at 6:53 pm (utc) on May 3, 2005]