homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
What are Google Fingerprints?
Fingerprints in "duplicate and near-duplicate detection" techniques.
tama




msg:806257
 12:51 am on Jul 29, 2005 (gmt 0)

From one of Google's patents regarding duplicate content:

----------------------------------------
Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.

----------------------------------------

Does anyone know what qualifies as "fingerprints?"

 

HitProf




msg:806258
 11:40 am on Jul 29, 2005 (gmt 0)

Isn't that explained in the same patent filing? Do you have a link?

ncgimaker




msg:806259
 12:35 pm on Jul 29, 2005 (gmt 0)

6658423,

Its just hashing of paragraphs (an order sensitive hash and a non order sensitive hash). They then match up the hashes. Sort the hashes into lists for comparisons.

Similar to sja65s suggestion, which has the same problem I've pointed out before:

[webmasterworld.com...]

ncgimaker




msg:806260
 12:54 pm on Jul 29, 2005 (gmt 0)

I'm not sure if the moderators will let me link to a CRM114 paper but its worth reading and comparing against the Google patent.

[crm114.sourceforge.net...]

"In this talk, we will examine the Sparse Binary Polynomial Hash (SBPH) filtering technique, a generalization of the Bayesian method that can match mutating phrases as well as single words. "

I hope they aren't granted that patent, a lot of more intelligent algorithms presuppose hashing of text blocks.
It would be like giving a patent on the rubber tyre when other companies already make high performance multi layer radials with water pressure ejection....

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved