What does "Similar Pages" mean for a particular site?

Forum Moderators: open

Message Too Old, No Replies

What does "Similar Pages" mean for a particular site?

nadsab

5:43 pm on Apr 4, 2003 (gmt 0)

In other words, how are those similar pages lists built in google, what determines if a site or page is a similar page?

Thanks.

hotelquest

5:45 pm on Apr 4, 2003 (gmt 0)

imo it means sites hat deal with a similar theme and which have similar 'html structuring' of the website. thats what I have noticed for some of my websites as they do appear as similar sites on google.

nadsab

5:53 pm on Apr 4, 2003 (gmt 0)

Thanks hotelquest,

does it also have anything to do with outbound links on the similar sites to the site in question?

Also if there are lots of similar sites, does that affect google ranking?

garylo

6:23 pm on Apr 4, 2003 (gmt 0)

Also if there are lots of similar sites, does that affect Google ranking?

Google will sometimes penalize sites that have duplicate pages. Or Different sites that are similar.

nadsab

6:31 pm on Apr 4, 2003 (gmt 0)

Thanks garylo,

I know duplicate pages are spam, but what I am trying to specifically find out is about the actual "Similar Pages" link which appears on search results for each web site listed in the index after a keyword search. When you click that link, a bunch of other web pages sometimes come up for popular web sites.

I was wondering if google takes into account inbound links to my site to determine if a page is a "Similar Page" to my page, or does goggle strictly go by the web content of all pages to independently determine if a web page is similar to my web page that comes up in the index?

AthlonInside

6:31 pm on Apr 4, 2003 (gmt 0)

RELATED SITES are actually sites which either

linked to you or

linked from you or

linked to a site which linked to you or

linked to a site which linked from you

robjones

6:35 pm on Apr 4, 2003 (gmt 0)

similar 'html structuring' of the website
Thats interesting, I've wondered before why totally unrelated websites are turning up in the related search for my website.

martinibuster

6:45 pm on Apr 4, 2003 (gmt 0)

From Google:
[google.com ]

Additional information about technology behind the "similar sites" results:

[google.com ]

dwilson

6:47 pm on Apr 4, 2003 (gmt 0)

linked to a site which linked to you or
linked to a site which linked from you

Thanks for that definition. that part makes it get pretty broad. Especially when you have a Geocities site & are in the Geocities directory ... "related" isn't as related as we would like to think.

nadsab

7:04 pm on Apr 4, 2003 (gmt 0)

Thanks guys,

Is there a google URL somewhere which states that "Similar Pages" are in part or whole determined by outbound links to a site that displays those similar pages under the "Similar Pages" link? I Need it for one of my clients to motivate him to do some homework.

Or Does "Similar Pages" = "Google Scout"?

I could not find any google documentation which specifically supports this theory.

martinibuster

7:33 pm on Apr 4, 2003 (gmt 0)

I have a suggestion: You can go to the google pages I gave you, dig around, and come back and report your findings here. The research will be good for everyone.

AthlonInside

7:33 pm on Apr 4, 2003 (gmt 0)

actually the related sites is not all related, assume you post you link in a forum with is crawl by google, the messsage you post might appear as related sites. If your site is about Cars but your post a message with your signature to a support forum of your hosting provider, it would appear as related sites.

What to do with the 'Car' and your 'hosting provider'? They are not related if we are talking about contents!

Siging guestbook will make the same happened.

Receptional Andy

7:38 pm on Apr 4, 2003 (gmt 0)

I've seen lots of simlar sites that are linked to on the same page as a link to you - quite often sites in the same dmoz category show up there. I don't think it's advanced enough to pick up on 'html structure'. IMO in this way Google hopes to pick up on sites that are on 'themed' pages of links, and therefore related.

AthlonInside

7:45 pm on Apr 4, 2003 (gmt 0)

I don't agree with 'html construct' at all. It is all about links and only links.

martinibuster

8:12 pm on Apr 4, 2003 (gmt 0)

GoogleScout also pulls from your open directory cat.

doc_z

8:12 pm on Apr 4, 2003 (gmt 0)

I agree with Receptional_Andy, related just seems to say "sites that are linking to that page also link to the following pages". (Similar than amazon: Customers who bought titles by xxx also bought titles by this author.) Of course, there is a complex algorithm to filter out the most important related sites.

I don't think that it has anything to do with the html structure.

hotelquest

9:01 pm on Apr 4, 2003 (gmt 0)

well...remember the first google programming contest?. One of the entries that got a mention was the one that 'was able to separate templates from content'. Now I have often noticed sites that have a common template, and similar style of writing content/theme turn up on 'similar' searches.

Though the fact that someone mentioned here it has to do with sites linking to common destination did not strike me as being the definition of the 'similar' searches. Interesting. However what value does this add to user joe who is looking for info?.

I believe that google is capable of understanding html structures within pages. think like a programmer for a moment. All you need to do is separate the tags from the content. Ready made functions in perl/php make this a breeze. Now arrange the html constructs in a tree and voila you get to find similarities between data structures.

If i am not wrong, this is a standard exercise in data structure courses where two or more trees are to be analyzed for similarity.

hotelquest

9:05 pm on Apr 4, 2003 (gmt 0)

i must also add that a lot of sites aggressively interlink internal pages and sub-domains. Now observe the fact that, these links are always placed in a format which separates them from the content portions of the page.

I figure google is quite capable of catching these tactics (though none have been penalised so far). In time google will reduce the importance it gives to pages which aggressively interlink within one site. Html structuring helps capture these artificial ways of boosting PR. You see this a lot in the online hotel rez industry.

hotelquest

9:07 pm on Apr 4, 2003 (gmt 0)

PS: ever noticed that links embedded within the content portions of a page carry more weight?.

jrobbio

2:08 am on Apr 22, 2003 (gmt 0)

Here's a twist in the story for you, its quite an obvious one but see what you think. I noticed it because I was getting clickthroughs from people typing www.addictingwidgets.bar, which was either the webmaster or it wasn't working at the time. I happen to frequent this place fairly often and it can be the first place I go, but not my homepage. It appears in my similar links as does other pages I frequent often. Now I use the Googletoolbar and I have given each one of these site a smiley face from time to time. This is all very common sense stuff, but here's the interesting thing.
If I do a related search for the www.addictingwidgets.bar site, a site I frequent that is nothing to do with it and have never linked to it either, appears prominently in the list. I am almost certainly the only person that frequents these two sites because of their chalk and cheese comparison is me. The other thing is that 99% of the time I have gone directly to my site by typing it in and the others too unless I'm sniffing around so the chances that they've put a particular tracking cookie are slim although possible, not counting the original one that gets placed when you enter google. Is my toolbar ID'd? Probably, so if I deleted the cookie for whatever reason, then it would just put it back on next time I visited or associate the two somehow.
The linking title etc theory I don't buy, unless it is combined with the use of the toolbar, then maybe we're onto something. Maybe the toolbar analyses the text of the link you click somehow and lets big G know.

mil2k

8:04 am on Apr 22, 2003 (gmt 0)

Suppose you are Site A and are listed on Site C. Now suppose Site B is also listed on the same Page of site C. So Site A and B become related. Also agree with the toolbar theory of jrobbio bcoz have heard such incidents before.

jrobbio

9:49 am on Apr 22, 2003 (gmt 0)

In this scenario, I am not site A since as link to the site does not appear on C. B however, links to C's main page everywhere. In this situation I am C, the unrelated site is B and A is linked to by C

Just to clarify that gobbledygook,
A = "Approved" site
B = Unrelated site
C = My site

And in this particular situation:
A links to B? No
A links to C? No
B links to A? No
B links to C? Yes
C links to A? Yes
C links to B? Yes (once)

My site is the only link between to the two AFAIK (its just so unlikely that I deem it proof), but I still cannot see that as being sufficient evidence for Google to decide that it is similar. Could it be that Google values the habits of a user considering the use of one site and then straight after another as being possibly related?