Forum Moderators: martinibuster
<paraphrase> We understand the concern regarding sites that appear to be scraper sites.
As the content owner, you may file a DMCA complaint with Google.
Publishers also must adhere to the webmasterguidelines [google.com...]
I highly suggest that you do not participate in these practices as they are violations of our policies.
We will take steps against other sites not adhering to our policies, but because we respect the confidentiality of all publishers, we cannot disclose additional details about them.
</paraphrase>
------------
Do you guys think some kind of tech or manual screen will be applied at some point? is it worth reporting a dmca complain? here's to hoping the situation improves.
[edited by: Jenstar at 8:28 pm (utc) on May 16, 2005]
[edit reason] paraphrased email quote; actual quotes not allowed as per TOS [/edit]
If the other sites are not adhering to our policies, we will take the necessary steps with that publisher.
Based on their current behavior, lies and lip service.
I reported someone modifying the search box to include search terms over 3 weeks ago and it still does.
Heck, I'd love permission to include the search term for Google when someone doesn't find what they want when running a search on my site - seems like a natural progression for 1-click to check elsewhere.
However, it's against the T&Cs and nothing gets done, so the scrapers will keep scraping, it's paying the bills.
On a related topic, I tried searching for something I actually needed this weekend and everywhere you go the SERPs are full of bottom feeding affiliates and AdSense sites, whether scraping or legit content sites, you just can't expect to see what you really want in the top 10 or 20 anymore.
Sad, truly sad.
Hopefully it's not an email from a rogue Google intern.
Diamondgrl - maybe we need more of those rogue google interns! We've been through many of the canned responses and a few suggesting actual fixes but I'm still not even sure what hit us back on Feb 2, though like many others we've noticed so many junk pages appearing higher than quality content, both in our travel niche and in other sectors.
For us the problem appears to be 302 and canonical page issues and we did a lot in early April with 301 redirection. We remain at about 10% of our Feb 1 G traffic. I sure hope GG comes back into this or any discussion relating to 302 problems. I think Google is risking a lot by keeping things quiet about hijacking and related issues such as our problems. Google has enough money to throw 10,000 editors at this problem and remove junk content manually, but I assume they don't because of their faith in automated solutions that are not working well.
I'd suggest a lot of people are losing faith in Google over this.
the top 10 or 20 results now all seem to be these useless pages of content which is almost irrelevant to your search.
I notice that snippets from my site are all over dozens and dozens of scraper sites, but I have never seen a scraper site that actually ranks very high on google except for searches on very obscure terms.
Maybe its just the field I'm in, or the types of searches I do. Does anyone have an example of a search phrase for google (that isn't really, really obscure) that will actually turn up a scraper site within the top 10 or 20 results?
I like Google's search results. Yesterday I wanted to know what hospital in Yuma, Arizona, USA (a small city) my mother is staying at. Searched Google for "yuma hospital" (no quotes in search) and there was her hospital right at the top. Within 5 minutes I was on the phone with my Dad who was in her hospital room.
I had a question about taxes. Typed a few words in Google and a page from the IRS' website that answered my question was right there. I am impressed.
If I want to find bad results, I'm sure I can find them. I don't bother.
Rover,
I agree. I don't come across scraper sites much. Just did a couple of searches of where they might be. Couldn't find them.
[edited by: guitaristinus at 12:35 am (utc) on May 17, 2005]
As I mentioned last week, I asked for permission to use the 5-star rating next to my adsense ads and gave them the example site. They said absolutely not allowed to do that. The 5-star rating site is still running (with my adwords ads).
I complained to Adwords about pages with no content displaying my ads. They said 'use the negative site list'. 25 scaper sites added and now were do I put the other 200?
The more competitive the keywords the worse it is. And it is harder to make a few bucks on any of them with the dilution. I can only guess but I think some companies must be spending 10's of thousands of dollars a day largely to scaper sites.
Google dances to their own band. You can slip the band a twenty but they won't change the beat.
So just because action has not yet been taken against reported sites, it could be possible that Google is collecting these reports to find things in common with these sites while they work out a way to automatate the process.
For me, a 'scraper' is a webmaster that copies content from other sites and puts it on his own
pages without permission. There are different ways to scrape content.
Among other things, scraping raises the possibility of duplicate content penalties from the
search engines, and is a disincentive to the authors of original content.
If done for the sake of advertising income, I call it theft. -Larry
I have had success in reporting violators to Google though. They don't exactly move fast, and you need to remind them sometimes. One site that was framing mine, showing their adsense ads and leaving mine as psa's no longer carries adsense. I'm guessing that G had a string of complaints about that one, as it had a lot of framed links showing psa's. I did report a string of sites that were asking visitors to click the ads, and all of them now comply.
As regards other violations, I can only suggest nagging them if you have reported a particular site and they still haven't done anything. Also, try contacting them at adsense-abuse@google.com rather than the "contact us" form.
For me, a 'scraper' is a webmaster that copies content from other sites and puts it on his own
pages without permission. There are different ways to scrape content.
By your definition Google is a 'sraper' site.
I would add to your definition that they build sites around certain keywords and the content they scrape is put on static pages so that they can rank highly for those keywords.
To me it would be hard to distinguish between a scraper site and a directory.
To me it would be hard to distinguish between a scraper site and a directory.
I tend to lump them in the same bracket. Both only exist to profit from advertisers without having much (if any) relevant content. There are some good directory sites out there, but the majority of directories tend to be made for adsense. Unfortunately the directory format is one that is popular for the reasons you quote.
" By your definition Google is a 'scraper' site."
Strictly speaking, yes. I should have specifically excluded legitimate Search Engines.
I'm sure the vast majority of us are thinking of sites that just copy other people's work, and
republish it on their (usually) ad-filled pages. -LH
"I would add to your definition that they build sites around certain keywords and the content
they scrape is put on static pages so that they can rank highly for those keywords."
I don't think that is necessary for a simple definition of 'scraper', just that they scraped
content, and are not a legitimate SE. -LH
"To me it would be hard to distinguish between a scraper site and a directory. "
Agreed. If a directory shows short sections of text from the originator (much like a Search
Engine does), or comes up with their own snippet, -and- provides honest (non BS) links back,
I would not call it scraping.
If the 'directory' is nothing but wholesale theft of content for the sake of its own ratings etc,
then I would have to call it scraping.
Common sense and common decency should dictate the difference. All too often, those asking
for precise limits, are simply looking for ways to legitimize their practices.
Calling Google a scraper when we WANT to get listed gives me the same uneasy feeling.
I don't consider SE cached pages scraping either. You can easily opt-out of that.
Try and get and keep your honest materials out of the REAL scraper sites.
You'll see the difference. -Larry
Common sense and common decency should dictate the difference.
Don't get me wrong, I'm not codoning what they do, but I still think that it will be impossible for Google to define what a scraper site is, without some sort of human judgement.
Because of this I think that if the content on a site isn't good enough to produce it's own advertising revenue then scraper sites are either something that you will have to deal with or learn from.
I don't have any shoot-from-the-hip algorithms to filter out scrapers, but good programming guided
by common sense could do some good.
1) Credit content to the page it is found on, and NOT to any redirect from a different site.
2) Intelligent scraper detection: For example, an honest author will only put up his own stuff,
maybe with other people's work by permission, and usually not too much of that.
In contrast, scrapers tend to suck in anything and everything indiscriminately.
Lets say my site has content A, and yours content B. You don't scrape me or vice versa but ..
Suppose Google finds A + B on site S (scraper) along with D,E,F,G and the dog and the cat
and loads of ads. That's a lot of computing yes, but so is PR and other things they already do.
A relatively simple algorithm should pinpoint the scraper in such a case.
3) Google could put up a Scraper Report Form. I hesitate to suggest this as it could lead
to abuses. -Larry
Yes - far to much abuse - people could report sites like Kelkoo, Amazon etc if they are competing in their niche as they carry the same product details etc. (Some may say it is valid but where do you draw the line)
For the record I have reported 4 scraper sites in the last week and they have all been removed (well done G)
"I would add to your definition that they build sites around certain keywords and the content they scrape is put on static pages so that they can rank highly for those keywords."
The only reason I don't add that to my definition is that not all scrapers do so, yet they remain
scrapers. Heck if I know why, but there are sites that just copy other sites with no rhyme or reason.
"Ask Jeeves is doin it..LOL "
AJ does lots of thinks I don't like. They FRAME their found sites as if they owned them.
I'd be more pissed if they were in the major leagues.
I haven't tried this yet. Suppose I search AJ for "Google Search". Clicking on Google,
would I be able to run a G search within an AJ frame? What does G think of that? -Larry
That didn't work. AJ is content to cage Google up in their first frame. Best wishes - Larry
I think we hit the nail on the head right here. People are really complaining because they don't like competition, not because their content is being jacked.
People are calling 'scraper' like they call 'noob' in shoot em up games. In the end, any motivated person can make a junk site with thematic content, and NOT use copyrighted material. So who are you going to file your complaints with then?
And for what reason, because you don't like their
tacky design, poor coding, bad spelling? There's a million of legit mom and pop websites that exhibit these traits.. what makes you the site police?