homepage Welcome to WebmasterWorld Guest from 54.211.219.178
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Dictionary.com scraping my content?
smithaa02




msg:4325313
 12:31 am on Jun 13, 2011 (gmt 0)

So to test for scrapers...I inserted a random sentence in direct quotes from my home page (that I absolutely wrote...about 12 words in length) and lo and behold dictionary.com appears 1,2, and 3 in the google SERP's...THEN my website...then 8 more different scrapers.

My question is this...how big of a deal is this? If dictionary.com (a titan at page rank 8) is outranking me in the SERP's (page rank 5)...does that mean google thinks not only that my content is duplicate and therefore is useless...but in fact might even penalize me for 'copying' the web darling dictionary.com?

The way dictionary.com appears to be doing this is by tracking(?) embedded ask.com searches from their reference section? So the three dictionary.com URL's that appeared in front of me (for my unique sentence in quotes) looked something like this:

dictionary.reference.com/browse/keyworda+keywordb (another might be keywordc and so forth)

Dictionary.com then displayed (not in a frame) ask.com results with the header stating something like:

"You are seeing Ask web results for keywordx because there was not a match on Dictionary.com."...folowed by a number of listings including other scrapers that copied my content.

So is this a problem? Am I overreacting to the SERP order? Is this something authorship markup could fix?

As a general rule is googling your own content in unique quotes and checking your listing order a good measure of whether you're getting credit for your own content?

 

lexipixel




msg:4325316
 12:59 am on Jun 13, 2011 (gmt 0)

What does cornigashen mean? Anyone know the definition, I couldn't find it in the dictionary


cornigashen, (kôrn-e-gA-SHin), n., of or referring to flatulent squatters who eat opulent poultry nightly, from Middle English corongater, "He was one of the cornigashen, and I hated him for his odor." First used by relative of Queen Elizabeth II in 1954.

lexipixel




msg:4325317
 1:00 am on Jun 13, 2011 (gmt 0)

..lets see who scrapes that.

(and yes, I think quoted string is a good test)

tedster




msg:4325321
 1:10 am on Jun 13, 2011 (gmt 0)

As a general rule is googling your own content in unique quotes and checking your listing order a good measure of whether you're getting credit for your own content?

It used to be - but in recent months that's all gone to hades. Your page might not even be returned for a long quote and still rank top for important searches using keywords in the quote.

I personally think and hope that this is a temporary issue and will return to sanity soon.

g1smd




msg:4325322
 1:15 am on Jun 13, 2011 (gmt 0)

You'd have thunk that Google would (at least internally) know that dictionary.com has scraped content, and therefore know that authorship attribution should be applied to some other site in the SERPs. Maybe they do?

lexipixel




msg:4325325
 1:28 am on Jun 13, 2011 (gmt 0)

Goog has indexed this thread and now has the word "cornigashen" (and its definition) in its index. Now lets see who takes the bait.

smithaa02




msg:4325330
 1:50 am on Jun 13, 2011 (gmt 0)

LOL at what you're doing lexipixel...

Although in this case I suspect it might be trickier to reproduce in this fashion (it will be curious to see what happens though!)

In order for 'my case' to be duplicated the following steps have to happen.

1) A scraper copies this page (strangely enough unless I'm being totally obtuse all the sample forum results from webmasterworld.com don't have scraper results...or perhaps google isn't giving them credit for webmasterworld.com duped content)

2) Ask.com indexes this scraper page (duplicate content of this page)

3) Ask.com gives enough credit to scraperx.com so they they rank in the top ten for 'cornigashen'.

4) Then here is the big mystery part...somehow google indexes a broken dictionary.com page with ask.com embeded serps (which in their top ten include scraperx.com and cornigashen text in the SERPs) . Why google let's a website embed SERP's, give credit to it and of all sites gives credit to its competitor ask.com...is a great question.

I've entered 'cornigashen' as a search phrase into dictionary.com and got a 'not found' result but I'm not this is what is needed to get this working. What's strange is why google would be crawling undefined words in dictionary.com...

smithaa02




msg:4325331
 2:06 am on Jun 13, 2011 (gmt 0)

To g1smd...good question. My fear is that google isn't using modification dates and internal past db comparisons to determine duplicate content...but rather perhaps a third 'trust' factor that is messing everything up. If google sees content from a darling university site (high chance page rank 8) and that same content duplicated from Jo-schmo blog...google might think that because universitysitex.com is so much bigger, older and has such higher page rank that they automatically get the credit for the duplicate content (that's my paranoid theory anyways). It did see seem that Panda really rewarded old/big corporate/big government established websites in general (like walmart.com) and because a good portion of Panda was supposed to relate to duplicate content I do think this could be a Panda problem in particular. Would kind of make sense too since checking for duplicate content (which in essence are series of phrases) based on past database comparisons has to be extremely processor intensive while simply rewarding a page with the best 'panda trust factor' would be much more processor efficient.

Kenneth2




msg:4325335
 2:55 am on Jun 13, 2011 (gmt 0)

My apology if I sound abit harsh, dictionary.com is a branded/powerful authority site, which means they can scrape your content, write lower quality content (than most less powerful sites) with impunity.

I remember a thread about a person putting his company name on twitter page, and that page appears before his main company site .......

lexipixel




msg:4325345
 4:39 am on Jun 13, 2011 (gmt 0)

...so far it's only spread to the meta SE's, (Metacrawler, Ask, Mamma, Dogpile, etc), all fed from Google.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved