Does G "forget" affiliations?

Forum Moderators: open

Message Too Old, No Replies

Does G "forget" affiliations?

We're talking Hilltop here...

caveman

5:01 pm on Oct 21, 2004 (gmt 0)

Past:
Site A1 links to site B1 links to site C1.

A1 and C1 are affiliated. C1 died in the SERP's.

Present:
A2 links to C1, and C2 links to A1.

A1 and C1 are no longer affiliated...and presumably G cannot see that they are affiliated. But they used to be.

Question:
Does G at some point "forget" that A1 and C1 were affiliated?

If so, then a site we thought was being filtered by the current algo for its affiliation to another site is instead being filtered for another reason. Perhaps a dup filter or some such thing. All the pages of the poorly performing site show good TBPR, FWIW.

MHes

12:25 pm on Oct 22, 2004 (gmt 0)

>Does G at some point "forget" that A1 and C1 were affiliated?

My guess is yes.

But there may be other factors here. Are all sites appearing in the search results for a keyphrase search? Are the ip's different?

AjiNIMC

1:06 pm on Oct 22, 2004 (gmt 0)

Hi caveman,

According to me

Site A1 links to site B1 links to site C1.
A1 and C1 are affiliated. C1 died in the SERP's.

This cant be true as how can Search engines find out that Site A1 and C1 related using some linking pattern (will they keep a track of all linking patterns of all sites? not very much possible but not impossible though).

What hilltop says is
------------------------
Two pages are affiliated conceptually if they are authored by authors from affiliated organizations. According to hilltop theory if Site A1 and Site B1 are affliated (say with same C class IPs) and B1 is affliated with C1 (may be as subsequent right-most token is the same eg widgets.com and wigets.uk).

Hilltop doc says

In practice some non-affiliated hosts may be classified as affiliated, but that is acceptable since this relation is intended to be conservative.

In Short using linking patterns it is not possible to find out whether two sites are related(except if you are not smart). I have written an article on this issue sometime back , how Search Engines can track two sites for relations?

They do
1) Using Same C class IP
2) Related:
3) Through whois

Never leave a pattern with links , use drunken mans path to success.

Have fun
AjiNIMC

SlyOldDog

2:37 pm on Oct 22, 2004 (gmt 0)

Related is scarey. Google has related some of our sites that are not even linked together. I don't know how they did it.

mfishy

3:04 pm on Oct 22, 2004 (gmt 0)

related some of our sites that are not even linked together

Do they share common backlinks?

Does G "forget" affiliations?

I would guess that yes, they do forget.

SlyOldDog

3:32 pm on Oct 22, 2004 (gmt 0)

Perhaps they do (share common backlinks). But I was not able to determine which they were.

Here is a thread I started a looong time ago on the subject: [webmasterworld.com...]

caveman

4:42 pm on Oct 22, 2004 (gmt 0)

My bad...not enough information provided. Here's more:

A1 and B1 are on different IP's, but A1 and C1 were on the same C block.

For reference from Hilltop paper (edited out non-relevant points):

We define two hosts as affiliated if one or both of the following is true:
*They share the same first 3 octets of the IP address.
The affiliation relation is transitive: if A and B are affiliated and B and C are affiliated then we take A and C to be affiliated even if there is no direct evidence of the fact. In practice some non-affiliated hosts may be classified as affiliated, but that is acceptable since this relation is intended to be conservative.
In a preprocessing step we construct a host-affiliation lookup. Using a union-find algorithm we group hosts, that either share the same rightmost non-generic suffix or have an IP address in common, into sets. Every set is given a unique identifier (e.g., the host with the lexicographically lowest hostname). The host-affiliation lookup maps every host to its set identifier or to itself (when there is no set). This is used to compare hosts. If the lookup maps two hosts to the same value then they are affiliated; otherwise they are non-affiliated.

Perhaps my first mistake was to read this too literally. Because A1 and B1 are not on the same C block, they should not be seen as affiliated. Thus in my reading of the above, they can't connect A1 to C1, since the connection travels thru B1. But, if they compare sites that are two generations away instead of one, then the connection between A1 and C1 becomes more apparent, since they are on same C block and connected by two degrees of separation.

Anyway, we moved C1 to another host many months ago, when it occured to us that C1 may have been affiliated to A1. Still, no rebound for C1. We don't really see any issues in dup (content, templates, WHOIS, etc.). But there is about 60% overlap of the products/services being offered. So we're left wondering: IF they made the connection and suppressed C1, are they still remembering?

We would have thought that since this is related to algorithmic activity, once the issue is cleared and new calcs are performed, the C1 site rebounds. Not so. Leaving us wondering if C1 continues to be remembered as being affiliated with A1.

caveman

4:54 pm on Oct 22, 2004 (gmt 0)

mfishy, Sly,

We also have other, more subtle (internal pages), evidence that G has become more aggressive than ever at nixing out sites/pages that they deem too similar to one and other.

I have no issue with this when the pages are largely dups in terms of what they offer, but when there is only 60% overlap, and the targets are completely different, common ownership alone seems to be a very aggressive defiinition of what constitutes spam, at least IMHO.

Total Paranoia

4:57 pm on Oct 22, 2004 (gmt 0)

Google has related some of our sites that are not even linked together. I don't know how they did it.

Slydog - Do you have the Toolbar installed with PR & Category switched on?

My guess is that you and other people from your organisation visit these websites everyday from the same IP. My opinon is that this could be enough for Google to relate them as you are sending information to them about your browsing.

martinibuster

5:33 pm on Oct 22, 2004 (gmt 0)

>>>there is only 60% overlap

That strikes me as kind of high. Am I mistaken?

Interesting thread.

Total Paranoia

5:54 pm on Oct 22, 2004 (gmt 0)

[quote]Does G "forget" affiliations?[\quote]

It has to doesn't it?

If Google didn't forget affiliations, then websites that are genuinely no longer related (sold on for example) may be filtered undeservedly.

caveman

6:15 pm on Oct 22, 2004 (gmt 0)

MB, your question hits on another topic I've been thinking may be worthy of a new thread, but to answer the question from my perspective...

IMHO, 60% is high if it's with the same kw's, to the same audience, since that really is just candy coated dup content (even though many advocate multiple sites that basically offer the same thing in order to reduce risk...and that's fine with me).
;-)

However, I don't think 60% is high if you are reaching different groups of people on different searches. Vague analogy: The major cereal makers offers *many* different brands. Is it OK that one offers corn flakes modified and repackaged into about seven different brands each with it's own line extensiions, when all those brand represent relatively minor tweaks to one core product, and in fact they are about 85% the same? I think so; they each appeal to different tastes and audiences, and hey, it's a free market. (Of course, many Europeans find our cereal isles absurd, but that another argument.)

But we go OT here a bit. I'll start another one on this perhaps.

Total Paranoia, yes I do believe the TB can play a role, FWIW, but we actually are SO paranoid that we don't use it. On your other comment, whether or not a site "deserves" to be shown in G's SERP's is of course for G to decide. I really don't know what they would think if they did a manual inspection of these two sites.

bakedjake

6:30 pm on Oct 22, 2004 (gmt 0)

Does G at some point "forget" that A1 and C1 were affiliated?

Unless Google is maintaining an index of "affiliations", yes, they would be forgotten.

I can't see them maintaining that database, because I'm not sure it provides any meaningful information to them. Not that they don't have the ability, but just that I don't think it would be useful.

My vote is yes, it would be forgotten with the way things are now.

60% is high

In my experience, it's high either way caveman. Regardless of the similarity or difference of sectors/audience.

caveman

6:43 pm on Oct 22, 2004 (gmt 0)

So, if they forget, then it's a logical conclusion that affiliations that lead to sites being filtered out of SERP's are in fact handled by algo *filters* (as opposed to penalties, since penalties are more likely to be remembered)?

Jake, when you say 60% is high, I presume you mean with respect to what G is likely to dislike?

glengara

6:55 pm on Oct 22, 2004 (gmt 0)

*affiliations that lead to sites being filtered out of SERP's*

I may well be wrong, but I never got that from either Hilltop or LocalRank, it was simply the links between them that were excluded from the calculations.

[edited by: glengara at 6:56 pm (utc) on Oct. 22, 2004]

martinibuster

6:55 pm on Oct 22, 2004 (gmt 0)

CM,
While you might be talking about Hilltop affiliation regarding your network of sites, I think Google is seeing old fashioned duplicate content and that's the likely culprit.

bakedjake

7:03 pm on Oct 22, 2004 (gmt 0)

algo *filters*

That's exactly how I understand it to be working, caveman.

caveman

10:23 pm on Oct 22, 2004 (gmt 0)

Thanks Jake.

MB, you may well be right, especially since it looks like I can't write this off to the algo anymore.

However, a word of caution to any/all paying attention: When I refer to "60% overlap of products/services being offered," I'm talking about the goods/services themselves (like two electronics retailers offering overlapping inventory to the extent of about 60%). Our codes are unique, we don't use feeds or straight manufacturer blurbs, etc, so in order for a SE to pick this up and tag it as duplicate content, they would literally have to compare inventory SKU's, which seems unlikely to me.

There's about a 20% dup in some parts of the code sitewide, but in the center to bottom of the HTML, so that also seemed unlikely to me.

OTOH, I'm sure it's more than just bad luck. ;-)

Tiebreaker

4:20 pm on Oct 23, 2004 (gmt 0)

Hmmmm ...

If this was true, surely google would consider that everyone here was affiliated with Webmaster World, since most of us visit every day.

caveman

7:03 pm on Oct 23, 2004 (gmt 0)

Nah, we don't use the TB, and they never see our real IP. ;-)