Welcome to WebmasterWorld Guest from 54.159.30.26

Forum Moderators: phranque

Message Too Old, No Replies

Article about scraper sites in today's WSJ

Even Wall Street Journal columnists can't get away from them

     
1:36 am on May 3, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Dec 15, 2003
posts:282
votes: 0


In today's Wall Street Journal, tech columnist Lee Gomes wrote a one-column article about how, in his search for information, he came across these made-for-AdSense "scraper sites" that had no useful content.

Here's the article (subscription required): [online.wsj.com...]

Nothing that's new, but it's intriguing that it's being covered in mainstream newspapers.

6:16 am on May 3, 2005 (gmt 0)

New User from US 

joined:Oct 16, 2014
posts:
votes: 0


You forgot to give us an username and password afther the URL.
6:38 am on May 3, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Dec 15, 2003
posts:282
votes: 0


Reprinted version of the article for those of you who don't subscribe:

[post-gazette.com...]

hunderdown

1:34 pm on May 3, 2005 (gmt 0)

Inactive Member
Account Expired

 
 


Thus, a kind of schizophrenia exists at search-engine companies. Half their engineering staff is busy trying to keep useless pages out of search results; the other half is busy coming up with tools that make it easier for people to create and profit from the useless pages in the first place.
-- From the article.

He goes on to hold out hope that developments like TrustRank will help.

1:56 pm on May 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 11, 2004
posts:1062
votes: 0


Scrapers are also going mainstream. Infospace's yellow pages are scraping the SERPS and hotbot.co.uk is indexing their's.

Infospace, by the way, owns dogpile.com and other lesser known meta search engines.

I like the author's comparison of the web today as the outlaw anything goes period.

Google has no one else to blame but themselves.

2:50 pm on May 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 7, 2002
posts:906
votes: 0


Hypocrite has a google ad at the bottom of his article lol

What he said was nothing new , nothing innovative it was a common topic rehashed to sell the WSJ and profit from the google ad and other ads on the page ..

looks like a scraper site to me

3:24 pm on May 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 11, 2004
posts:1062
votes: 0


nothing new , nothing innovative...

Really? That's a great definition a scraper site.

3:35 pm on May 3, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 21, 2005
posts:32
votes: 0


wouldn't that make most sites scrapper sites?

please, that site is far from a scrapper site.

3:35 pm on May 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 7, 2002
posts:906
votes: 0


Well it's like you said Freedom ..scapers are going mainstream..

the article was nothing new ..same old story ..then they cover the page in advertising how is that any different then most web pages.

Not nearly as bad when I was at MSN (front page)the other day following an article to the "read more" page .the "read more" page was 90% advertisements .. there was lterally less then 20 words about the topic .. then you had to click "read more" (again)to get to page 3 to finish the article and be bombarded with half the page in advertisements.

Listen I clearly see the pages that are the true culprits ..but these major websites are no different ..they simply make the garbage look a bit more professional

3:49 pm on May 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 11, 2004
posts:1062
votes: 0


Your simplifying it too much. Just because a website runs an ad, doesn't mean it's a scraper.

Characterizing WSJ as a scraper because it runs some ads is an inappropiate comparison. The article itself is important because it draws more attention to the Catch 22 Google has put itself in.

Sure, some of them are ad whores, see Yahoo, but I suspect they lose more then they gain. Time and market forces will tell if their flea market look will prevail.

True, some of the news services and mainstream portals make you run through 5 pages to read their entire article, etc., but at least they have something to offer.

Scrapers have NOTHING to offer.

Potemkin Village websites have NOTHING to offer.

Although one could place the blame squarely on lazy, greedy uninsightful webmasters, Google is also to blame.

3:52 pm on May 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


the article was nothing new ..same old story ..then they cover the page in advertising how is that any different then most web pages.

What's "new" is that this sort of information is getting published in high profile media. And its the big companies with huge marketing budgets that read this sort of stuff, and they tend to believe when they read them in WSJ rather than WebmasterWorld.

A solution to this frenzy is to allow advertisers to track conversions on per-referring-site basis and switch (automatically) those off that fall below acceptable for them limit. This way non-converting scraper sites won't earn a dime and have to switch to something else.

4:13 pm on May 3, 2005 (gmt 0)

Full Member

joined:Mar 17, 2005
posts:296
votes: 0


The obvious solution is for G to stop trying to filter sites. In 2001, when I could walk up to to the top of the SERPs for virtually any term I chose was also the time in which G delivered the best results.

Adam Smith's Invisible Hand is just as useful in web search as in economics. If undue regulation is removed then sites about widgets which are titled "widgets" and mention widgets on the page will rise to the top because it is more profitable for someone producing real widget information to work hard to rank for widgets than anyone else.

G's current so-called spam filtering elimates the most relevant pages, leaving only the wastelend of "vaguely optimized for a wide variety of topics" type sites in the top SERPs.

G has outsmarted itself and due to it's misapplied human intervention and spam filtering has created a less relevant product.

Trust Rank is no solution as it sounds like G's "democratic" method of Page Rank with the even playing field removed. Trust Rank will be an excuse to kill off scrapers and most small publishers as well.

4:16 pm on May 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 16, 2003
posts:593
votes: 0


I agree with Lord Majestic and the WSJ author.

First, scrapper sites are growing... and growing fast. With a host account and articlebot you can create a 10,000 page scrap site in about a few days. Then it is a simple matter of rinse and repeat.

I think the only way a solution will work is with human intervention in the process. Either TrustRank or Lord Magestic's suggetion or both.

It look the WSJ just a few lines to realize a site was a scraper site but even the latest version of googlebot doesn't know the difference between that an high value content.

4:21 pm on May 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 16, 2003
posts:593
votes: 0


The obvious solution is for G to stop trying to filter sites. In 2001, when I could walk up to to the top of the SERPs for virtually any term I chose was also the time in which G delivered the best results.

Wow thats a dumb statement, your saying that when you were the only one packing keywords google provided the best results? So therefore YOUR results were the best in every keyword? Gimmie a break! If google allowed had the same system now as in 2001 you would be screaming bloody murder because some Indonesian would be blackhatting better than you could. It would be one giant battle of blackhat SEO... I'm sure that would produce some interesting results.

Google delievered the best results in 2001 because there was far fewer people playing the system, this was the reason that google's system worked because google's system relies on honest people. If people link to me with the word widget, and I use the word widget a lot, I will probably be about widgets. But once the masses started catching on then everyone was putting "widget" all over the place and the system is breaking down.

4:44 pm on May 3, 2005 (gmt 0)

Full Member

joined:Mar 17, 2005
posts:296
votes: 0


zulufox,

I'll admit I'm dumb if you'll admit you're ugly.

There is no great difference between the number of scammers and spammers in 2001 and now. Did you even have an Internet connection in 2001?

Here's a thought, stop taking your jackass pills for a couple of days and reconsider this:

It is more profitable for a producer of real information about widgets to work hard to rank for the term widgets than it is for a spammer.

Think real hard about that.

If I sell widgets - actual on topic widgets - I have got to make more money from visitors looking for widgets than a scam site which doesn't really have any widget info. Therefore, as a widget seller, I am much more highly motivated to compete with other real widget makers for the term 'widgets' than a spammer.

Only when G penalizes sites for being 'too relevant' does this process become impaired.

4:46 pm on May 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2001
posts:926
votes: 0


The site in question may or may not have been a scraper - but it definetly sounds like adsense spam. That it gets attention in the WSJ is great news - these sites should be dropped from the SERPS, dropped from the adsense program and RIP.

I would like to see google take a strong public stand on such sites - and not look for an automated solution to an automated problem. But alas...

A solution to this frenzy is to allow advertisers to track conversions on per-referring-site* basis and switch (automatically) those off that fall below acceptable for them limit. This way non-converting scraper sites won't earn a dime and have to switch to something else.

- scraper sites probably convert better than some quality sites, which lines both the publisher, googles and the advertisers pocket - the losers are the searching public not looking to buy, and the publishers of proper sites..

* as a publisher & advertiser I would also like to see this added, but not as a smartpricing feature - but just to show which sites are display my ad's - and give me a heads up to click fraud. I know that I could scour my logs for this information - but it's already data google has - please display it.

5:25 pm on May 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 16, 2003
posts:593
votes: 0


I'm ugly.

(Waits...)

5:30 pm on May 3, 2005 (gmt 0)

Full Member

joined:Mar 17, 2005
posts:296
votes: 0


zulufox,

I'm dumb. But despite my vocal impairment, insights flow uninterrupted from my fingertips through my keyboard only to be wasted on the CRTs of fools.

5:37 pm on May 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 16, 2003
posts:593
votes: 0


I'm dumb.

That was totally worth it.

Anyway, quit being so defensive. Your theory is stupid but you probably a great guy. Calm down, take a deep breath, and rejoin the discussion.

5:39 pm on May 3, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 5, 2005
posts:153
votes: 0


to say there is no difference in the number of scammers and spammers in 2001 and now is pure silliness.
6:01 pm on May 3, 2005 (gmt 0)

New User

10+ Year Member

joined:Mar 21, 2005
posts:17
votes: 0


to say there is no difference in the number of scammers and spammers in 2001 and now is pure silliness.

Ya, I don't know what internet he has been surfing on.

6:14 pm on May 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 16, 2003
posts:593
votes: 0


I agree that there is no way there are the same number of spammer and scrapers today as in 2001.

More and more people (I'm sure helped by sites such as WebmasterWorld) are learning about making money online and for many of these people scraper sites appear to be the easiest way to get that money.

Why write your own content when you can scrap (either through copy and paste or a content rewriting script) someone elses content.

6:19 pm on May 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 22, 2004
posts:1082
votes: 0


only Google and there Adsense program is responsible for this.

they are not checking websites before adding in adsense thats the big problem

6:34 pm on May 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 16, 2003
posts:593
votes: 0


they are not checking websites before adding in adsense thats the big problem

I completely agree.

I know I got into adsense well before I should have been let in.

It is quite ironic that google's adsense is promoting the scrapping that is ruining google's search.

6:35 pm on May 3, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 14, 2003
posts:198
votes: 0


The sad part is that we all may have to convert to scraper sites or die. I've watched the scraper sites invading Google for a long time. The bottom line: they can't be stopped. Even if Google takes down one site, the scrapers replace it with 100 others by the next day.

I don't know if you've ever bothered to check out how bad it is, but some of my sites have literally been scraped into oblivion. There are hundreds of sites that have captured parts of my content -- and most of them are ahead of me in the rankings.

The major failure of Google is that a person who really has useful content is forced to play a SEO ranking game to stay listed. This is wrong. Just building good content will not get you a high ranking. The scraper sites just come along and take your content and use it against you.

It's so disappointing to search in Google only to find endless scraper sites, as well as sites in Chinese that are ranked higher than you only because the search term is the only English text on their page.

Clearly, there are people here who are already familiar with programs like "articlebot", although I never heard of it until today.

6:40 pm on May 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 16, 2003
posts:593
votes: 0


The major failure of Google is that a person who really has useful content is forced to play a SEO ranking game to stay listed. This is wrong. Just building good content will not get you a high ranking. The scraper sites just come along and take your content and use it against you.

I completely agree.

Although I hate hate hate to see it, I believe in 5 years most of the quality content on the net will be hid behind micropayments (but will still be searchable by google). I can see of no other way for publishers not to be screwed over every time they put lots of research into a topic.

It is far too easily to scrap and as people in the third world join the net some find no downside to scraping, since nobody can touch them.

It is sad because when I write content, I know dozens of people copy it and put it on their sites either through straight copy and paste or through something like articlebot.

It is not just professional scrapers, but teenagers looking for a quick buck.

6:43 pm on May 3, 2005 (gmt 0)

Full Member

joined:Mar 17, 2005
posts:296
votes: 0


Just because G is listing vastly more spam now than they did in the past doesn't mean that there has been a hugh increase in the percentage of spammers as opposed to responsible publishers. G is listing more spammers because they broke their engine by ignoring some basic rules about what constitutes relevancy.

The theory that alot more spammers have gotten websites since 2001, resulting in worsening SERPs is incorrect. The number of responsible publishers coming online has also increased since 2001, which should be a statistical 'wash' -- leaving the SERPs as useful as they were in 2001.

I don't buy zulufox's theory that WebmasterWorld has helped to create an increase in the percentage of spammers vs the number of responsible publishers. If anything, WebmasterWorld readers are encouraged to avoid spam techniques because although they may provide short term gains, they just as surely lead to long term losses.

The SERPs are full of crap because G doesn't work as well as it did in the past. And if they don't fix the problem, the Invisible Hand will simply place some other SE on top.

6:44 pm on May 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 16, 2005
posts:456
votes: 0


A solution to this frenzy is to allow advertisers to track conversions on per-referring-site basis and switch (automatically) those off that fall below acceptable for them limit. This way non-converting scraper sites won't earn a dime and have to switch to something else.

So scraper sites that convert are ok?

Rather than focus on conversions, adsense applications should be required for every site not just the first one belonging to a publisher.

6:49 pm on May 3, 2005 (gmt 0)

Full Member

joined:Mar 17, 2005
posts:296
votes: 0


It is not merely the existence of the scraper sites that is killing responsible publishers.

It is a combination of G's mishandled attempt to remove spam by penalizing pages that are 'too relevant,' plus the abilty of scapers to post thousands of pages that are opimized for 'everything at once but really nothing at all' (allowing them to escape the 'too relevant' filter on a variety of search terms).

Scrapers can and will be stopped when G stops penalizing sites for being too relevant or when a new SE comes along with a good algo and technicians smart enough to know when to stop tweaking the dials.

[edited by: Atticus at 6:52 pm (utc) on May 3, 2005]

6:51 pm on May 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 16, 2003
posts:593
votes: 0


I don't buy zulufox's theory that WebmasterWorld has helped to create an increase in the percentage of spammers vs the number of responsible publishers. If anything, WebmasterWorld readers are encouraged to avoid spam techniques because although they may provide short term gains, they just as surely lead to long term losses.

This very thread is a perfect example. Everyone on this thread is complaining that scrapper sites are taking over google and making money. Furthermore, we have discussed the methods (articlebot and offshore hosting) to make a "good" scrapping site.

If some amoral personality visits this thread and reads it, they have all the information they need for a scrapper site.

If you never taught anyone how to drive, there was be no drunk drivers. As you teach more people how to drive, there are more drunk drivers.

[edited by: zulufox at 6:53 pm (utc) on May 3, 2005]

This 59 message thread spans 2 pages: 59