Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google no longer knows who the owner of content is

         

chrisv1963

4:45 pm on Apr 6, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I guess we can no longer count on Google to protect our property.

I have been searching today with snippets of text from my website to find content thiefs. Very disappointing. In many cases Google is ranking the thief higher than the original source.

One sample of stolen content (text + image) is really unbelievable. The violating website is absolutely low quality, with nothing else but stolen texts + images. Advertising allover (the surface used for advertising is about twice the surface used for text): Three 300x250 Adsense blocks and one 300x250 Amazon block.

We have been working like crazy to improve the quality of our websites because after Panda Google told us to do so. What we see however is that low quality websites are running of with our content and getting good rankings for it. This is not the Google I used to know. Something is very wrong.

I'm sorry, but I lost ALL trust in Google. Isn't Google simply broken or do we need to use black hat tacticks to rank for our own content?

synthese

1:46 am on Apr 11, 2011 (gmt 0)

10+ Year Member



@ScubaAddict I always thought the date/time that G first crawled the page was what they used. Which means it's entirely possible that scraper sites with a higher crawl rate can get in first, then presumably its down to authority. Anyway this is a moot point. Pre-panda they had it right -- but obviously not in all circumstances -- thus the need for the algo change.

I'm about to put meta name="syndication-source" content="[canonical url]" into each page head. Hey I've tried a hundred things, one more thing won't hurt...

luke175

1:51 am on Apr 11, 2011 (gmt 0)

10+ Year Member



luke175 - where would the original publication date come from? If this is a databased date of publication, it would be easily forged. If it is a server timestamp - what happens when you change the file, or change it's name? What if you have a hard drive failure, and had to reupload an entire site?

I can't see how they could get a reliable 'publish' date from anything.

The only thing that could be of use (that I can see) would be an index date as recorded in google's database, even though that still has it's faults.


What I was referring to would have been more correctly called the first indexed date.

My point was Google should have been able to see my content was indexed first. Additionally, the domain of the scraped content was not even a registered or indexed when Google first indexed my content- so why would Google index them higher than me for the same content?

ScubaAddict

2:17 am on Apr 11, 2011 (gmt 0)

10+ Year Member



So, is it possible that we (the content source) are essentially being penalized for having the duplicate content?

I always thought google made the assertion that a competitor couldn't hurt your ranking... but if they scrape my content and are seen as the content-source, then now google sees me as the scraper - and I am due for the site-wide penalties for scraping sites.

Has anyone discredited this as an actual 'unintended' result of Panda? Because this would explain just about everything that has happened to my site.

CainIV

2:39 am on Apr 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Solution: Only allow the original content to be crawled by Google for 48hrs. eg: cloak it - then release it to the general public after it shows in Google index.


This doesn't stop Google from incorrectly attributing content incorrectly. A significant percentage of the time a new website can post the same content, even after content has already been indexed on the originating website, and still trump that content as if it had ownership at least in terms of how Google ranks the document.

Sgt_Kickaxe

3:20 am on Apr 11, 2011 (gmt 0)



In many cases Google is ranking the thief higher than the original source.


You can't tell that by just looking for snippets. If you try you'll just see which site ranks best for the snippet and perhaps the scaper site has covered the basics of that snippet more than you have so they should rank higher.

To see if they outrank you search for actual keywords instead and your article should outrank theirs. Snippets are useless, except in finding copycats.

rlange

2:24 pm on Apr 11, 2011 (gmt 0)

10+ Year Member



tedster wrote:
But it is not so trivial to spoof an IP address and as far as I know, it's impossible to spoof your way through this process. See How To Verify Googlebot [webmasterworld.com]

It is fairly easy to spoof an IP address, but you're right in that it would be absolutely useless in this situation. Thanks for pointing that out.

--
Ryan

ScubaAddict

5:35 am on Apr 12, 2011 (gmt 0)

10+ Year Member



You can't tell that by just looking for snippets.

@Sgt_Kickaxe - are you saying that it would be incorrect to think that if I wrote this 100% original sentence:

ScubaAddict is a gobblygook nokypoke who loves to scuba dive on the fictional planet Aquatious.

And I put that on my website and allowed it to be indexed... then put the EXACT sentence into quotes and searched on google, I shouldn't expect that I would be the #1 result (with all of those who scrape it verbatim behind me)?

I thought I should rank first for a nonsense sentence that I made up - not a scraper site.

hyperkik

6:58 am on Apr 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ScubaAddict, I agree with Sgt_Kickaxe to the degree that if you are looking a a specific snippet, even if (and perhaps particularly if) it falls in the range of "peculiar" to "gibberish", you can't read too much into how Google ranks pages that include the snippet. If you check two, three, four snippets from your page and you're always ranking below copies, especially if they're the same copies, I think it's indicative of a problem. If you're not even showing up for two or more of the unique snippets unless you click to "repeat the search with the omitted results included," I think that's indicative of a big problem.

TheMadScientist

8:15 am on Apr 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mod's note: Moved from another location... comments on the Matt Cutts video, cited earlier in this thread....

How can I make sure that Google knows my content is original?
[youtube.com...]

Mind Boggling Quote?
'At any time we can only go and fetch a certain finite number of pages, if we tried to fetch them all, and our architecture can almost support that, then the web might crash from all of those requests...'

Did he really say they could potentially crash the entire web?
Talk about a dos attack. lol

The Basic Summary is:
If they don't get it right, file a DMCA and if the site copies more than a little bit, file a Spam Report too...

<BeingASmartAss>
I ran it back a bit a couple of times and didn't hear him say, 'Unless of course it's ehow doing the copying, then save us all some time and don't bother filing...', so maybe people need to file more complaints when they think their work has been 'borrow' by the big guys?
</BeingASmartAss>

[edited by: Robert_Charlton at 7:37 pm (utc) on Apr 12, 2011]

Tallon

9:06 am on Apr 12, 2011 (gmt 0)

10+ Year Member



'Unless of course it's ehow doing the copying, then save us all some time and don't bother filing...', so maybe people need to file more complaints when they think their work has been 'borrow' by the big guys?


Except the big guys are smart enough to remix it "just enough" (remember they have a lot of finger and toes touching all that data). I just love love love when I see ehow or wikihow linking to my pages in their writing dev area (behind locked doors). I just know another doozy is on its way. Sigh.

enigma1

9:37 am on Apr 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Before Panda scrapers did not outrank my original stories. Now they do. That is all that matters. Google lost a functionality.

In my case that goes way back it's not really the scrappers but how the search engines treat original content from sites vs site ranking.

About 7 years ago when I published a couple of applications via a new domain I acquired at the time, the search engines in general would list my domain right at the top for the application name at least. And that despite the fact I also included the software with various shareware sites who way outranked me. I was still at the top.

Then around 4 years ago lots of these things changed from what I experienced. I published software in open source repositories and although search engines had crawled the content and software from my domain before releasing to other sites, the repositories outranked me to the point I wasn't even on the first 2 pages of search results.

That may give you an idea why scrappers can be so effective as the author factor is so marginalized in the search engine's algo.

danijelzi

1:21 pm on Apr 23, 2011 (gmt 0)

10+ Year Member



Additionally, the domain of the scraped content was not even a registered or indexed when Google first indexed my content- so why would Google index them higher than me for the same content?


Scraper sites with my own texts are indexed higher than me, even if they republish my articles a day after I was indexed. I initially thought that they rank higher because they have more backlinks, better optimized page, etc. But, after checking backlinks to them, it appears that they are linked from only a few low quality sites. Also, usefulness of their "sites" is zero, since they just randomly scrap the content from various sources and put it below huge ads. Also, I don't see how are they better on-page optimized than me.

Another interesting thing: I've filled spam reports for a couple of.blogspot.com blogs which copied my content and ranked higher than me. After a day, the blogs were disabled by Blogspot team, but Google SERPs still shows them above me five days after their removal from Blogspot. Now in my niche you have disabled .blogspot.com blogs all over the top SERP positions, showing "Page not found" message.

P.S. By closing scrapers' blogspot blogs, Google has at least indirectly admitted that spam/scrap is highly ranked on their search.

My_Media

1:09 am on May 2, 2011 (gmt 0)

10+ Year Member



Hi All,
Are you guys still seeing that scrapper outranked us from our own article? I have checked Yahoo and Bing, they know and credit us only and no scrapper was listed. Google is lossing the battle by letting scrapper outrank original.
I reported for reconsideration few days ago, scrapper gone for a few days then comes back with more scrapper.

Sad :(

[edited by: tedster at 1:24 am (utc) on May 2, 2011]
[edit reason] moved from another location [/edit]

tedster

1:34 am on May 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google's total index has a whole lot more in it than Bing/Yahoo - and that might be part of their problem with knowing the original publisher versus the scraper sites.

Every domain I check with the site: operator shows Google reporting a much higher number than Bing or Yahoo. Looks like they are biting off more than they can chew.

supercyberbob

6:16 am on May 2, 2011 (gmt 0)

10+ Year Member



Maybe quit biting on my content and swallow.
This 105 message thread spans 4 pages: 105