homepage Welcome to WebmasterWorld Guest from 184.72.82.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 59 message thread spans 2 pages: < < 59 ( 1 [2]     
Report a Scraper Outranking You - Matt Cutts tweet
netmeg




msg:4649833
 9:19 pm on Feb 27, 2014 (gmt 0)

Matt Cutts tweeted out that they're collecting information on scrapers that outrank you for your own content. No answer yet if they're actually going to *do* anything about it, or just use the data on the algorithm.

Original tweet:
https://twitter.com/mattcutts/status/439122708157435904 [twitter.com]

If you see a scraper URL outranking
the original source of content in Google, please tell us about it:
http://bit.ly/scraperspamreport

Scraper report link:
https://docs.google.com/forms/d/1Pw1KVOVRyr4a7ezj_6SHghnX1Y6bp1SOVmy60QjkF0Y/viewform [docs.google.com]

So when is WebmasterWorld gonna process secure links? *My* calendar says it's 2014
.

[edited by: Robert_Charlton at 10:05 pm (utc) on Feb 27, 2014]
[edit reason] added quote and sorta cheated to fix https links [/edit]

 

robster124




msg:4650464
 1:40 am on Mar 2, 2014 (gmt 0)

Why not submit google products like local search, knowledgegraph and reviews. They seem to outrank the websites that are the original source that have put in the hard work.

jmccormac




msg:4650465
 1:46 am on Mar 2, 2014 (gmt 0)

Authority and all that. spammy.biz registered 3 months ago scraping and outranking authoritydomain.com registered in 1998 is a typical situation. Not just highlighting age here, but many other factors that Google strangely overlook and ignore (or more bizarrely, can't recognise).
There's a problem with domain names that a lot of people, including those in Google, miss. The registration dates are no longer accurate as an indication of when the domain was last registered. Domains are auctioned and transferred before they are deleted and that means that they don't drop and effectively keep their original registration date.

Regards...jmcc

IanCP




msg:4650476
 3:03 am on Mar 2, 2014 (gmt 0)

Oh Duh! And here's us smucks who have been around for a great many years - even before Google came along - and "somewhat" believed Mr. Google's "Authorship" would solve a great many of these problems.

Obviously not.

I'm an original technical author, I finished writing before Google and AdSense were ever invented - I NEVER developed my web sites with a view to monetising them. I know that's a difficult concept for some to grasp.

IanCP




msg:4650478
 3:05 am on Mar 2, 2014 (gmt 0)

One day I might begin to believe in Google! Well at 72, I need someone to replace the "Tooth Fairy" and "Santa Claus".

EditorialGuy




msg:4650479
 3:05 am on Mar 2, 2014 (gmt 0)

Oh Duh! And here's us smucks who have been around for a great many years - even before Google came along - and "somewhat" believed Mr. Google's "Authorship" would solve a great many of these problems.

Obviously not.


Google Authorship is in its infancy. Check back in a year or two or three.

jmccormac




msg:4650489
 3:32 am on Mar 2, 2014 (gmt 0)

Google Authorship is in its infancy. Check back in a year or two or three.
People still have amazing faith in the ability of the people in Google. :) They cannot even solve what I consider simple problems (detecting/dealing with hacked/compromised sites with dodgy links) so I guess I'm a bit more cynical about how they will deal with scrapers.

Regards...jmcc

brotherhood of LAN




msg:4650494
 4:07 am on Mar 2, 2014 (gmt 0)

jmcc, are you going to share your answer to their problem or perhaps monetise it and offer it that way? Perhaps if they're trying and failing they could solve the problem by giving you lots of cash in exchange for the answer.

What I'm trying to say is, tell us what the simple solution is that the people of Google are failing to grasp!

If you feel it's OT feel free to start a new thread

[edited by: brotherhood_of_LAN at 5:10 am (utc) on Mar 2, 2014]

MrSavage




msg:4650499
 5:06 am on Mar 2, 2014 (gmt 0)

Just to clarify further, a couple hundred ranking issues are going to be investigated and will result in some sort of tangible movement? I did in fact submit in the past, but to me this is a rehash thread. We've talked and linked to this exact issue/form previously. In that sense, should I or anyone else feel differently about it? I'm not being cynical. Can we guess what sort of participation in this submission game that will actually have impact? Perhaps I'm just looking for acknowledgement. I'm really not in much of a mood of wasting my time these days.

ColourOfSpring




msg:4650505
 9:52 am on Mar 2, 2014 (gmt 0)

Rather than wait for Google to index your well-written article, why can't we ping Google with our most prized info the moment it's published. Google can then timestamp it and note the URL. It can be checked against existing content to verify its uniqueness. If it's unique, the timestamp/URL can show it's the canonical/original article. Surely that's better than the archaic "report a spammer" method?

indyank




msg:4650513
 11:53 am on Mar 2, 2014 (gmt 0)

I am not sure why Danny Sullivan chose to write this post now when google had been doing this for what I guess is a couple of years old? Is it because of Cutts tweet?

tangor




msg:4650520
 1:24 pm on Mar 2, 2014 (gmt 0)

It is an attempt to look proactive. G's cred is in the tank these days, and they know it.

EditorialGuy




msg:4650526
 3:08 pm on Mar 2, 2014 (gmt 0)

People still have amazing faith in the ability of the people in Google. :)


That's pretty much I was saying. You were complaining that Google Authorship hasn't solved the scraper problem. Apparently you have a lot more faith in beta products from Google than the realists among us do. :-)

londrum




msg:4650533
 5:40 pm on Mar 2, 2014 (gmt 0)

sometimes google might prefer the scraper site to come first though which confuses it.
a lot of stuff on wikipedia is lifted from other sites, for example, but 9 times out of 10 the user would probably rather just visit them, because they already know it well. as much as it pains us webmasters to say it, the public largely don't care what the originating site was, as long as they get the info.

i'm sure google takes that into account. there's no way they are just going to stick the original piddly-little site at the top every single time, even if it would be the fair thing to do

buckworks




msg:4650535
 6:21 pm on Mar 2, 2014 (gmt 0)

even if it would be the fair thing to do


If fairness is tossed out the window, that sure undercuts "Don't Be Evil" IMHO.

There's also the problem that sites that rank well because they're well known also become known because they rank well. Round and round it goes ...

netmeg




msg:4650554
 8:09 pm on Mar 2, 2014 (gmt 0)

And in my case, most of my scrapers are well established newspaper sites or television sites - much more overall authority and brand than I have, although not in my tiny little niche. So how's Google supposed to deal with that? (To be fair, for the most part they've dealt with it pretty well, though I have no idea how they are doing it. They're getting it massively wrong with the one-box though)

aristotle




msg:4650567
 10:21 pm on Mar 2, 2014 (gmt 0)

The submission form includes a checkbox for:
Confirm your site is following our Webmaster Guidelines and is not affected by manual actions.

Does this mean that if you create great original content, but don't follow the guidelines and/or get a manual penalty, then even if Google knows that it's your work, they still might feel justified in letting scrapers get the benefits?

jmccormac




msg:4650576
 11:59 pm on Mar 2, 2014 (gmt 0)

mcc, are you going to share your answer to their problem or perhaps monetise it and offer it that way? Perhaps if they're trying and failing they could solve the problem by giving you lots of cash in exchange for the answer.
Now that would be a display of real genius by Google but I fear that they suffer from the "Not Developed Here" syndrome. :) This might sound a bit Fermat's Last Theorem but I think that this is not the thread for the explanation.

Regards...jmcc

aakk9999




msg:4650577
 12:05 am on Mar 3, 2014 (gmt 0)

Does this mean that if you create great original content, but don't follow the guidelines and/or get a manual penalty, then even if Google knows that it's your work, they still might feel justified in letting scrapers get the benefits?

If a site is penalised, then a scaper may outrank it purely because the penalised site is demoted in SERPs and not necessarily because Google wants to rank the scraper higher.

EditorialGuy




msg:4650599
 1:27 am on Mar 3, 2014 (gmt 0)

The submission form includes a checkbox for:

Confirm your site is following our Webmaster Guidelines and is not affected by manual actions.


Makes sense. Consider it a form of triage.

MrSavage




msg:4650602
 1:49 am on Mar 3, 2014 (gmt 0)

To keep on point, did anyone here fill out the form? How many entries did you make? We are a small sample size, but the brightest stars are here amongst us. If you filled it in, then you believe. I need to believe also! Action in this instance would give me a better idea if I'm out to lunch.

netmeg




msg:4650703
 12:21 pm on Mar 3, 2014 (gmt 0)

No. I have a ton of scrapers, but none of them outrank me so far.

rish3




msg:4650720
 1:55 pm on Mar 3, 2014 (gmt 0)

There's a technical solution to this problem.

If you can isolate the part of the content that won't change in the future (eliminating, for example, dynamic content, ads, etc), you can create a provable, trusted timestamp using a few bits of cryptography.

Search Google for "trusted timestamp" or "RFC 3161".

It does require some trusted infrastructure. You could, however, leverage something that already exists. For example, there's an existing implementation that uses Bitcoin's blockchain for proof...you send a 5 cent transaction to get your timestamp in the public blockchain.

Perhaps too heavy for Google to proactively check for everything it indexes, but it could be leveraged as a backend for a complaint form.

RedBar




msg:4650723
 2:14 pm on Mar 3, 2014 (gmt 0)

To keep on point, did anyone here fill out the form? How many entries did you make?


Yep, half a dozen, however I've done this form before and no action was taken even though the major scraper is at least 9 years younger than my site i.e. 1998 v 2007

netmeg




msg:4650730
 3:06 pm on Mar 3, 2014 (gmt 0)

I don't think it's likely that action will be taken - I think they're looking for examples so they can tweak an algorithm or two. Google wants stuff that scales.

Dymero




msg:4650790
 8:06 pm on Mar 3, 2014 (gmt 0)

I agree with netmeg. This is probably an attempt at feeding the algorithm material for a crack at using machine learning for this problem. Google has done this before for some of its services in order to improve voice recognition and the like.

IanCP




msg:4650791
 8:29 pm on Mar 3, 2014 (gmt 0)

Search Google for "trusted timestamp" or "RFC 3161"

Assuming it is feasible from a cost stand point - how does that afford any protection against existing scrapers?

It might offer some legal validity as evidence against future scrapers, but not past ones.

rish3




msg:4650795
 9:11 pm on Mar 3, 2014 (gmt 0)

Assuming it is feasible from a cost stand point - how does that afford any protection against existing scrapers?

It might offer some legal validity as evidence against future scrapers, but not past ones.


Yes, it's just a way to prove that you were the "first" to timestamp a specific piece of data. No historical timestamps or other magic is supported.

carminejg3




msg:4664494
 3:30 am on Apr 20, 2014 (gmt 0)

I wont get into detail because people hear will cry I'm off topic, but the problem with google is they have BOAT loads of money and yet they want to automate everything with an algorithm. Some things will need human eyes.

But you would think checking something as the location of the main server could tip you off. IE in my case the scrapper hosts their site in Russia (needs a court order to remove the site) another image hosting scrapper is in the Netherlands. All google has to do is degrade people hosted in these countries, epically when severed over cloudflare or a similar cdn.

Another tip, is as a site gets larger we move from shared to dedicated ips.. My scrapper is on a shared ip with 30+ sites. My site is on 1 server with 1 ip for the site. REALLY? you can't tell who is stealing whos content?

In the end google is googles biggest enemy. They reward the scrapper sites... And some of their DMCA specialist could get hit in the face with a wet fish and couldn't tell you what hit them.. (A 655 word article copied verbatim and they sent me 3 emails saying they didn't see the problem)

flatfile




msg:4664522
 7:42 am on Apr 20, 2014 (gmt 0)

@carminejg3 you are obviously oversimplifying this, you are basically viewing it from your own personal circumstances. So every site hosted in Netherlands and Russia should be have its rankings lowered now?.

Another tip, is as a site gets larger we move from shared to dedicated ips.. My scrapper is on a shared ip with 30+ sites. My site is on 1 server with 1 ip for the site. REALLY? you can't tell who is stealing whos content?


There are big scrapers out there that scrape everyone including sites on shared hosting. Do you see the impact that would have on those small sites?. For an example a lot of image scrapers either start off on dedicated servers or move there within few months of being live.

This 59 message thread spans 2 pages: < < 59 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved