Forum Moderators: open

Message Too Old, No Replies

duplicate sites algorithm

         

scorpion

6:20 pm on May 21, 2003 (gmt 0)

10+ Year Member



How does google determine two sites are duplicates of each other? Is there an article on this?

Does the bot just look at the title and URL? Also, does google check url's that look similar for duplicate content?

I ask because in the highy competitive domain name world, there are probably different sites with similar url's.

jeremy goodrich

6:24 pm on May 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They use an algorithm that I'm 100% sure they would not give away the details too.

However, there are some great research papers - written by Google engineers & staff - that you can read which shed some light on this subject.

Consider this research paper by Bharat [citeseer.nj.nec.com] one of the 'top' engineers at Google, imho.

Very good reading, and more than likely, will shed some very scientific light on the problem for you (read it myself a while ago, informative).

Alternatively, try checking out labs.google.com/papers.html for more publications by their staff.

scorpion

7:17 pm on May 21, 2003 (gmt 0)

10+ Year Member



thanks, interesting stuff.

I guess one question is if a company has two different designs on different hosts to sell a similar product, is this considered mirroring?

jeremy goodrich

7:21 pm on May 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That paper - read it. *If* Google has or is considering implementing an algo for such scenarios - as you just pointed out - the answer would be in a paper just like that link above.

Or...the answer would be nowhere but in the hearts & minds of Google engineers. I prefer to think that, by reading those papers, you get a much better idea of what Google considers duplicate & not.