homepage Welcome to WebmasterWorld Guest from 54.196.225.45
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google AdSense
Forum Library, Charter, Moderators: incrediBILL & jatar k & martinibuster

Google AdSense Forum

This 60 message thread spans 2 pages: 60 ( [1] 2 > >     
Will Google eventually be dumping scraper sites?
interesting email from Google i received in response
javahava




msg:1445604
 8:24 pm on May 16, 2005 (gmt 0)

So I, like many others, are quite frustrated about adsense scraper sites for a variety of reasons: e.g., they're diluting the revenue pool using other people's content (and are perhaps partly responsible for large dips in EPC), encouraging site owners to spend time creating junk instead of user-oriented content, and spamming up google's index. i sent all these concerns to google, with the note that i'd probably start making such sites myself if such sites were allowable under the TOS. i asked out of frustration, point blank, if such sites were ok if started producing them. this is the response i got:

<paraphrase> We understand the concern regarding sites that appear to be scraper sites.

As the content owner, you may file a DMCA complaint with Google.

Publishers also must adhere to the webmasterguidelines [google.com...]

I highly suggest that you do not participate in these practices as they are violations of our policies.

We will take steps against other sites not adhering to our policies, but because we respect the confidentiality of all publishers, we cannot disclose additional details about them.
</paraphrase>
------------

Do you guys think some kind of tech or manual screen will be applied at some point? is it worth reporting a dmca complain? here's to hoping the situation improves.

[edited by: Jenstar at 8:28 pm (utc) on May 16, 2005]
[edit reason] paraphrased email quote; actual quotes not allowed as per TOS [/edit]

 

incrediBILL




msg:1445605
 8:34 pm on May 16, 2005 (gmt 0)

If the other sites are not adhering to our policies, we will take the necessary steps with that publisher.

Based on their current behavior, lies and lip service.

I reported someone modifying the search box to include search terms over 3 weeks ago and it still does.

Heck, I'd love permission to include the search term for Google when someone doesn't find what they want when running a search on my site - seems like a natural progression for 1-click to check elsewhere.

However, it's against the T&Cs and nothing gets done, so the scrapers will keep scraping, it's paying the bills.

On a related topic, I tried searching for something I actually needed this weekend and everywhere you go the SERPs are full of bottom feeding affiliates and AdSense sites, whether scraping or legit content sites, you just can't expect to see what you really want in the top 10 or 20 anymore.

Sad, truly sad.

diamondgrl




msg:1445606
 10:15 pm on May 16, 2005 (gmt 0)

I find it encouraging that you're actually getting that email. While I wonder how much is "lip service" - I certainly wouldn't use the term "lies" - the first step in solving a problem is to recognize that you have one. And this is the first I've seen of Google specifically saying that scrapers are not legit.

Hopefully it's not an email from a rogue Google intern.

joeduck




msg:1445607
 10:33 pm on May 16, 2005 (gmt 0)

Hopefully it's not an email from a rogue Google intern

Diamondgrl - maybe we need more of those rogue google interns! We've been through many of the canned responses and a few suggesting actual fixes but I'm still not even sure what hit us back on Feb 2, though like many others we've noticed so many junk pages appearing higher than quality content, both in our travel niche and in other sectors.

For us the problem appears to be 302 and canonical page issues and we did a lot in early April with 301 redirection. We remain at about 10% of our Feb 1 G traffic. I sure hope GG comes back into this or any discussion relating to 302 problems. I think Google is risking a lot by keeping things quiet about hijacking and related issues such as our problems. Google has enough money to throw 10,000 editors at this problem and remove junk content manually, but I assume they don't because of their faith in automated solutions that are not working well.

I'd suggest a lot of people are losing faith in Google over this.

stuartmcdonald




msg:1445608
 10:35 pm on May 16, 2005 (gmt 0)

Vaguely encouraging.

Has anyone personally reported a scraper and seen an actual response from Google? ie., seen the adverts removed?

incrediBILL




msg:1445609
 10:40 pm on May 16, 2005 (gmt 0)

Are you kidding?

I had a valid DMCA complaint and couldn't get AdSense yanked.

The ISP shut them down and Google dropped them from the listings temporarily but they came back after compliance with AdSense intact so I don't hold much faith the T&C compliance.

TheDonster




msg:1445610
 11:49 pm on May 16, 2005 (gmt 0)

It certainly seems like a no win for Google. On one hand, for sure they're pulling in millions with these awful scraper sites. But like Bill just pointed out, the top 10 or 20 results now all seem to be these useless pages of content which is almost irrelevant to your search. The question is when will Google get back to producing quality search results which is what got them where they are today?

rover




msg:1445611
 12:12 am on May 17, 2005 (gmt 0)

the top 10 or 20 results now all seem to be these useless pages of content which is almost irrelevant to your search.

I notice that snippets from my site are all over dozens and dozens of scraper sites, but I have never seen a scraper site that actually ranks very high on google except for searches on very obscure terms.

Maybe its just the field I'm in, or the types of searches I do. Does anyone have an example of a search phrase for google (that isn't really, really obscure) that will actually turn up a scraper site within the top 10 or 20 results?

guitaristinus




msg:1445612
 12:26 am on May 17, 2005 (gmt 0)

It would be an exciting time when/if Google takes "...steps against sites not adhering to policies." Could happen anytime and all at once.

I like Google's search results. Yesterday I wanted to know what hospital in Yuma, Arizona, USA (a small city) my mother is staying at. Searched Google for "yuma hospital" (no quotes in search) and there was her hospital right at the top. Within 5 minutes I was on the phone with my Dad who was in her hospital room.

I had a question about taxes. Typed a few words in Google and a page from the IRS' website that answered my question was right there. I am impressed.

If I want to find bad results, I'm sure I can find them. I don't bother.
Rover,
I agree. I don't come across scraper sites much. Just did a couple of searches of where they might be. Couldn't find them.

[edited by: guitaristinus at 12:35 am (utc) on May 17, 2005]

sailorjwd




msg:1445613
 12:34 am on May 17, 2005 (gmt 0)

I think I've had about 1 out of 10 sites removed from adsense.

As I mentioned last week, I asked for permission to use the 5-star rating next to my adsense ads and gave them the example site. They said absolutely not allowed to do that. The 5-star rating site is still running (with my adwords ads).

I complained to Adwords about pages with no content displaying my ads. They said 'use the negative site list'. 25 scaper sites added and now were do I put the other 200?

The more competitive the keywords the worse it is. And it is harder to make a few bucks on any of them with the dilution. I can only guess but I think some companies must be spending 10's of thousands of dollars a day largely to scaper sites.

wyweb




msg:1445614
 12:36 am on May 17, 2005 (gmt 0)

I have reported maybe half a dozen sites that were in clear violation of adsense terms. They pissed me off because I'm trying my best to do the right thing. To date these sites are all still online.

Google dances to their own band. You can slip the band a twenty but they won't change the beat.

Rodney




msg:1445615
 2:17 am on May 17, 2005 (gmt 0)

I think the theme for google has been to create automated ways to detect fraud rather than to manually remove every site reported (they seem to do the same way with spam reports to Google search)

So just because action has not yet been taken against reported sites, it could be possible that Google is collecting these reports to find things in common with these sites while they work out a way to automatate the process.

trader




msg:1445616
 3:10 am on May 17, 2005 (gmt 0)

Can someone please explain exactly what a scraper site is?

larryhatch




msg:1445617
 4:55 am on May 17, 2005 (gmt 0)

Trader: Opinions may differ, this is MUCH discussed in these forums.

For me, a 'scraper' is a webmaster that copies content from other sites and puts it on his own
pages without permission. There are different ways to scrape content.

Among other things, scraping raises the possibility of duplicate content penalties from the
search engines, and is a disincentive to the authors of original content.
If done for the sake of advertising income, I call it theft. -Larry

david_uk




msg:1445618
 6:16 am on May 17, 2005 (gmt 0)

I agree that they need to do remove scrapers from the indexes, as on any search the top couple of pages are often scrapers. Also, most of the ads on the pages are scrapers! Ultimately people won't use their search engine if it doesn't pull up relevant results. One would hope they realise this.

I have had success in reporting violators to Google though. They don't exactly move fast, and you need to remind them sometimes. One site that was framing mine, showing their adsense ads and leaving mine as psa's no longer carries adsense. I'm guessing that G had a string of complaints about that one, as it had a lot of framed links showing psa's. I did report a string of sites that were asking visitors to click the ads, and all of them now comply.

As regards other violations, I can only suggest nagging them if you have reported a particular site and they still haven't done anything. Also, try contacting them at adsense-abuse@google.com rather than the "contact us" form.

Teshka




msg:1445619
 6:23 am on May 17, 2005 (gmt 0)

For the first time today, I ran across something about a site having been removed from the SERPs as a result of a DMCA complaint. (I just tried the search again and didn't get the message.) So, maybe they are trying to do something... sporadically ;)

totter




msg:1445620
 6:25 am on May 17, 2005 (gmt 0)

For me, a 'scraper' is a webmaster that copies content from other sites and puts it on his own
pages without permission. There are different ways to scrape content.

By your definition Google is a 'sraper' site.

I would add to your definition that they build sites around certain keywords and the content they scrape is put on static pages so that they can rank highly for those keywords.

To me it would be hard to distinguish between a scraper site and a directory.

david_uk




msg:1445621
 6:33 am on May 17, 2005 (gmt 0)

To me it would be hard to distinguish between a scraper site and a directory.

I tend to lump them in the same bracket. Both only exist to profit from advertisers without having much (if any) relevant content. There are some good directory sites out there, but the majority of directories tend to be made for adsense. Unfortunately the directory format is one that is popular for the reasons you quote.

photo200




msg:1445622
 6:43 am on May 17, 2005 (gmt 0)

Google exists also ONLY because of profit.

stuartmcdonald




msg:1445623
 6:53 am on May 17, 2005 (gmt 0)

I think one diff is that with Google, you can disallow it with robots.txt - you can't do that with a run of the mill scraper

larryhatch




msg:1445624
 6:55 am on May 17, 2005 (gmt 0)

Trader:

" By your definition Google is a 'scraper' site."

Strictly speaking, yes. I should have specifically excluded legitimate Search Engines.

I'm sure the vast majority of us are thinking of sites that just copy other people's work, and
republish it on their (usually) ad-filled pages. -LH

"I would add to your definition that they build sites around certain keywords and the content
they scrape is put on static pages so that they can rank highly for those keywords."

I don't think that is necessary for a simple definition of 'scraper', just that they scraped
content, and are not a legitimate SE. -LH

"To me it would be hard to distinguish between a scraper site and a directory. "

Agreed. If a directory shows short sections of text from the originator (much like a Search
Engine does), or comes up with their own snippet, -and- provides honest (non BS) links back,
I would not call it scraping.

If the 'directory' is nothing but wholesale theft of content for the sake of its own ratings etc,
then I would have to call it scraping.

Common sense and common decency should dictate the difference. All too often, those asking
for precise limits, are simply looking for ways to legitimize their practices.
Calling Google a scraper when we WANT to get listed gives me the same uneasy feeling.

I don't consider SE cached pages scraping either. You can easily opt-out of that.
Try and get and keep your honest materials out of the REAL scraper sites.
You'll see the difference. -Larry

totter




msg:1445625
 7:32 am on May 17, 2005 (gmt 0)

Common sense and common decency should dictate the difference.

Don't get me wrong, I'm not codoning what they do, but I still think that it will be impossible for Google to define what a scraper site is, without some sort of human judgement.

Because of this I think that if the content on a site isn't good enough to produce it's own advertising revenue then scraper sites are either something that you will have to deal with or learn from.

larryhatch




msg:1445626
 8:11 am on May 17, 2005 (gmt 0)

Sorry Totter, I called you 'trader'.

I don't have any shoot-from-the-hip algorithms to filter out scrapers, but good programming guided
by common sense could do some good.

1) Credit content to the page it is found on, and NOT to any redirect from a different site.

2) Intelligent scraper detection: For example, an honest author will only put up his own stuff,
maybe with other people's work by permission, and usually not too much of that.
In contrast, scrapers tend to suck in anything and everything indiscriminately.

Lets say my site has content A, and yours content B. You don't scrape me or vice versa but ..
Suppose Google finds A + B on site S (scraper) along with D,E,F,G and the dog and the cat
and loads of ads. That's a lot of computing yes, but so is PR and other things they already do.
A relatively simple algorithm should pinpoint the scraper in such a case.

3) Google could put up a Scraper Report Form. I hesitate to suggest this as it could lead
to abuses. -Larry

Dayo_UK




msg:1445627
 8:17 am on May 17, 2005 (gmt 0)

>>>>>3) Google could put up a Scraper Report Form. I hesitate to suggest this as it could lead
to abuses.

Yes - far to much abuse - people could report sites like Kelkoo, Amazon etc if they are competing in their niche as they carry the same product details etc. (Some may say it is valid but where do you draw the line)

For the record I have reported 4 scraper sites in the last week and they have all been removed (well done G)

cat5




msg:1445628
 8:23 am on May 17, 2005 (gmt 0)

"I would add to your definition that they build sites around certain keywords and the content they scrape is put on static pages so that they can rank highly for those keywords."
Ask Jeeves is doin it..LOL

larryhatch




msg:1445629
 8:42 am on May 17, 2005 (gmt 0)

Hi Cat5:

"I would add to your definition that they build sites around certain keywords and the content they scrape is put on static pages so that they can rank highly for those keywords."

The only reason I don't add that to my definition is that not all scrapers do so, yet they remain
scrapers. Heck if I know why, but there are sites that just copy other sites with no rhyme or reason.

"Ask Jeeves is doin it..LOL "

AJ does lots of thinks I don't like. They FRAME their found sites as if they owned them.
I'd be more pissed if they were in the major leagues.

I haven't tried this yet. Suppose I search AJ for "Google Search". Clicking on Google,
would I be able to run a G search within an AJ frame? What does G think of that? -Larry

larryhatch




msg:1445630
 9:12 am on May 17, 2005 (gmt 0)

OK, now I tried it. I googled up AJ and opened the AJ search page. From there, I searched for
"google search". Found Google [framed of course] and used Framed Google to look for Jeeves again.
I went in circles, trying to get frames within frames within frames.

That didn't work. AJ is content to cage Google up in their first frame. Best wishes - Larry

flyerguy




msg:1445631
 9:57 am on May 17, 2005 (gmt 0)

"The more competitive the keywords the worse it is. And it is harder to make a few bucks on any of them with the dilution. I can only guess but I think some companies must be spending 10's of thousands of dollars a day largely to scaper sites."

I think we hit the nail on the head right here. People are really complaining because they don't like competition, not because their content is being jacked.

People are calling 'scraper' like they call 'noob' in shoot em up games. In the end, any motivated person can make a junk site with thematic content, and NOT use copyrighted material. So who are you going to file your complaints with then?

And for what reason, because you don't like their
tacky design, poor coding, bad spelling? There's a million of legit mom and pop websites that exhibit these traits.. what makes you the site police?

guitaristinus




msg:1445632
 10:22 am on May 17, 2005 (gmt 0)

Come to think of it, I scraped most of my sites.

sailorjwd




msg:1445633
 10:25 am on May 17, 2005 (gmt 0)

I'm the site police because my adwords money is being spent on these sites and now have the ability to exclude them so I do. I also report each one that is breaking TOS. 5-star ratings, zero content, suspect click patterns, etc.

This 60 message thread spans 2 pages: 60 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google AdSense
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved