homepage Welcome to WebmasterWorld Guest from 54.226.136.179
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Content, Writing and Copyright
Forum Library, Charter, Moderators: not2easy

Content, Writing and Copyright Forum

    
Google still ranks scrapers above original content
lzr0



 
Msg#: 4676573 posted 2:58 pm on Jun 1, 2014 (gmt 0)

Funny that Google repeatedly claimed their algo knew whose content is original. Well, baloney! Here is a small example. In search for 'atx pinout' (without quote) one of the top results is image shown in this page (removed) .

Aside from the fact the image has imprinted url (can't Google search use OCR?), the text next to it explicitly states what page this image was taken from. Could not Google algo recognize this? Yet, the original diagram is nowhere in the search results while the copy in the top.

[edited by: not2easy at 5:13 pm (utc) on Jun 1, 2014]
[edit reason] Removed URL per TOS [/edit]

 

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4676573 posted 5:36 pm on Jun 1, 2014 (gmt 0)

No one likes to see their images used on someone else's site, but the URL was removed because as the Forum Charter states, we want to stick to topics around content and NOT the specific content itself.

As for Google using OCR, no they pretty much stick to text. Is it possible that some formatting or settings on your site might prevent them from giving proper positioning for your image? You can now see your pages the same way Google sees them by using "Fetch as Google" in your GWT account. There is an active discussion of that here: [webmasterworld.com...]

The text explicitly stating where they scraped it from is to protect them from claims of DMCA because they are giving you proper credit for ownership of the image, considered to be "Fair Usage". Right or wrong, this is the standard set for content online.

lzr0



 
Msg#: 4676573 posted 9:31 pm on Jun 1, 2014 (gmt 0)

not2easy,

Sorry for posting urls-- I did not really read TOS carefully.

The text explicitly stating where they scraped it from is to protect them from claims of DMCA because they are giving you proper credit for ownership of the image, considered to be "Fair Usage". Right or wrong, this is the standard set for content online.


I don't care about them- this image is totally unimportant to me-- my issue was with Google algorithm. If there is a text next to an image that states that this image was taken from such and such page, their algo should've understood this (even if using OCR in the images is too much for them). I don't think I did anything wrong with formatting of my original image- it used to be on the top for the relevant search but after one of Google updates just felt off the face of the earth.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4676573 posted 9:43 pm on Jun 1, 2014 (gmt 0)

@lzr0 - have you tried using Google's authorship to identify your content as being original and owned by you?

Supposedly that resolves the problem.

If you haven't tried it, please do and report back if it resolves this issue.

Thanks.

FYI, in many cases Google knows where the original author's site was because of the date and time on the post, where it first showed up in a sitemap, etc.

Why this issue continues is a level of ignorance from Google that truly baffles me to no end for a company with so many supposedly highly educated people that they can't figure out who authored content, or create simple methods to identify your content during the publishing process. It would seem to me the first site submitting a sitemap linking to that content should be considered the author and how the hell else would some other site get the content prior to the author publishing it in the first place?

Then, the only thing you can surmise knowing all this, is that Google values their site above yours and your simply being the author isn't valuable enough to rank at the top for your own content which is truly idiotic.

I suggested years ago that simply using the sitemap PING was all that was needed as the person to initially publish could PING the new URL to Google. Once that initial PING occurs during publish Google knows all it needs to know, no crazy authorship markup nonsense, nothing more.

Simple and elegant but nooooo.....

A Google engineer said to me "How do we know that the fist PING is from the actual content owner?" - OK, how would anyone else get your content before you released it to the world and sent that PING? I mean, seriously, duh?

Ridiculous Google.

Maybe Bing could step in here and set some web standards but they're way to busy playing follow the leader.

lzr0



 
Msg#: 4676573 posted 9:55 pm on Jun 1, 2014 (gmt 0)

have you tried using Google's authorship to identify your content as being original and owned by you?


Yes, I do have verified authorship on my site- when my page appears in search results it usually shows my face.

I am not sure images are affected by authorship- technically they have their own urls (different from url of the page the appear in), in which you can't add authorship markup. Likewise, images are not listed in sitemap (probably it is doable by hand, but when you use an automatic sitemap generator, it picks only htmls).

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4676573 posted 10:02 pm on Jun 1, 2014 (gmt 0)

but when you use an automatic sitemap generator, it picks only htmls


True, but the unique image is referenced by that HTML, it's the first time Google has indexed that image.

Does your image have all of the meta data in it asserting you own it, etc?

Perhaps you've simply found a flaw in authorship regarding images vs. pages that should be reported to them as a possible oversight. Wouldn't surprise me if authorship doesn't carry over to the image index whatsoever, which is probably the problem.

FYI, The best defense is a good offense. If you install the proper anti-hotlinking code in your .htaccess file (assuming you use Apache) you could stop anyone from downloading, or hotlinking, the image except search engines and visitors reading the page, which would stop the problem in the first place. An additional way to stop it would be to install a bot blocking script on your site to keep the scrapers ranking above you off your site so the problem never occurs.

lzr0



 
Msg#: 4676573 posted 10:12 pm on Jun 1, 2014 (gmt 0)

Does your image have all of the meta data in it asserting you own it, etc?


Actually, no, I only added alt= tag that describes the image, but not that I am its author.

But regardless, since they have such an option in the their image search results as "show similar images" they apparently can recognize that image X is similar to image Y and image X appeared on the web before image y, and that text next to image Y states it was taken from the page that displays image X.

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4676573 posted 10:52 pm on Jun 1, 2014 (gmt 0)

You aren't totally defenseless, there is information at Google Webmasters Guidelines. Read this page, or watch the video: [support.google.com...] and aother section that deals with Scraped Content issues: [support.google.com...]
Not that they offer any quick fix or even promise to take action, but reports can sometimes tip the balance against sites that are walking too close to the penalty lines.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4676573 posted 1:44 am on Jun 2, 2014 (gmt 0)

Actually, no, I only added alt= tag that describes the image, but not that I am its author.


I meant image meta data, you actually identify yourself in the image file itself.

You can do some of this via the Windows Explorer "properties" under the "details" tab you'll find a place to specify the author and some other stuff.

Don't know if Google looks at any of that but you should claim your image from the start.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Content, Writing and Copyright
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved