Forum Moderators: open
I know duplicate pages are spam, but what I am trying to specifically find out is about the actual "Similar Pages" link which appears on search results for each web site listed in the index after a keyword search. When you click that link, a bunch of other web pages sometimes come up for popular web sites.
I was wondering if google takes into account inbound links to my site to determine if a page is a "Similar Page" to my page, or does goggle strictly go by the web content of all pages to independently determine if a web page is similar to my web page that comes up in the index?
Additional information about technology behind the "similar sites" results:
[google.com ]
Is there a google URL somewhere which states that "Similar Pages" are in part or whole determined by outbound links to a site that displays those similar pages under the "Similar Pages" link? I Need it for one of my clients to motivate him to do some homework.
Or Does "Similar Pages" = "Google Scout"?
I could not find any google documentation which specifically supports this theory.
What to do with the 'Car' and your 'hosting provider'? They are not related if we are talking about contents!
Siging guestbook will make the same happened.
I don't think that it has anything to do with the html structure.
Though the fact that someone mentioned here it has to do with sites linking to common destination did not strike me as being the definition of the 'similar' searches. Interesting. However what value does this add to user joe who is looking for info?.
I believe that google is capable of understanding html structures within pages. think like a programmer for a moment. All you need to do is separate the tags from the content. Ready made functions in perl/php make this a breeze. Now arrange the html constructs in a tree and voila you get to find similarities between data structures.
If i am not wrong, this is a standard exercise in data structure courses where two or more trees are to be analyzed for similarity.
I figure google is quite capable of catching these tactics (though none have been penalised so far). In time google will reduce the importance it gives to pages which aggressively interlink within one site. Html structuring helps capture these artificial ways of boosting PR. You see this a lot in the online hotel rez industry.
Just to clarify that gobbledygook,
A = "Approved" site
B = Unrelated site
C = My site
And in this particular situation:
A links to B? No
A links to C? No
B links to A? No
B links to C? Yes
C links to A? Yes
C links to B? Yes (once)
My site is the only link between to the two AFAIK (its just so unlikely that I deem it proof), but I still cannot see that as being sufficient evidence for Google to decide that it is similar. Could it be that Google values the habits of a user considering the use of one site and then straight after another as being possibly related?