Hilltop algo: PROVED? ...or just our best guess for now?

Forum Moderators: open

Message Too Old, No Replies

Hilltop algo: PROVED? ...or just our best guess for now?

tedster

1:23 am on Jul 24, 2004 (gmt 0)

There's been a lot of conjecture and best guessing about the Hilltop algorithm from George A. Mihaila and Krishna Bharat, and whether Google incorporated some version of this in their major hiccup last November. At that time Google was talking about semantics, not Hilltop.

One of the key Hilltop concepts was identifying "affilated" websites - sites whose IP addresses shared the same c-block, or sites with the same right-most token in the domain name when the TLD is stripped (orgname.org and orgname.com would be considered affiliated and links between them devalued.)

Has anyone seen actual evidence that one or both of these "affiliated" determinations are actually in effect?

kwasher

8:38 pm on Jul 26, 2004 (gmt 0)

Right Tedster. Its almost a 'glass half full or half empty' scenario. Rather than considering a penalty to the one case, it can be considered a bonus to the other. Its either a penalty to those same-IP/class linkers, or it could rather be considered a bonus to those non-same-IP/class linksers.

Which brings us back to your original question.... does anyone have any proof?

Troppo

7:49 am on Jul 27, 2004 (gmt 0)

Which brings us back to your original question.... does anyone have any proof?

Unlikely. We can only use our analytical skills to deduce a probability. This discussion is very useful for that purpose.

Keep in mind that these papers were written a few years back and what was then a good notion for detecting affiliation may no longer be the case. The important and unchanging fact is that detection of affilation is an absolute requirement for these algorithms to work. That means a lot of attention would be given to that area.

If you think it through there are many on and off page clues to affiliation rather than just the server's C-block. My guess is that they would use a set of these clues and assign a threshold score at which affiliation is assumed. In the absence of other clues a shared C-block might not hit the threshold.

caveman

12:19 am on Aug 1, 2004 (gmt 0)

What I find interesting in all this (to the point that it makes us question our assumptions) is why more members don't tie this line of thinking/discussion into the questions surrounding what is sometimes referred to as 'sandboxing.'

IMHO, these two topics are closely related. Having a number of sites that were never sandboxed this year, and others that were, we're able to assess the differences between the two sets. We see strong connections between these two topics.

But maybe it's all the houch we drink here in the cave. :-)

tedster

2:23 am on Aug 1, 2004 (gmt 0)

What's the conection that you see? "Sandboxed" when all the links are from affiliated websites, and immediately out there when there is a significant number of inbound expert links?

shri

4:12 am on Aug 1, 2004 (gmt 0)

From my limited experience (one site .. which is behaving like it should and several that are not) the difference is Dmoz listings and 301s from some old sites.

We consolidated 4 ethnic specific domains into one portal which got its dmoz listing fairly quickly. The 301s were from sites which have existed since '97.

rocco

11:51 am on Aug 1, 2004 (gmt 0)

we are running a niche search portal and due to people trying to cheat us with cloaking and redirecting pages we started banning sites based on ip/c-blocks. although we ban some legit sites, our index turned very clean and relevant.

since hilltop kicks in for broad topics with a lot of pages, i dont see any negative aspect from google's point of view by disregarding "suspicous" backlinks and loosing a few legit pages. there are still enough pages left.

caveman

7:29 pm on Aug 1, 2004 (gmt 0)

tedster, more or less, yes.

When we look at our '04 launch sites that had no issues, and those that did, and then consider that most who post about the problem are launching their sites with help from ... umm ... closely related sites, it makes sense to me.

I get hives when I think of all the ways that G can make connections between sites (e.g., IP, C, WHOIS, toolbar, SiteA->SiteB->SiteC->, dup code/content, etc.). Then I think about how their logic is probably that in most cases those connections are *not* coincidental, and that they probably don't mind if a few innocents get taken down in the process, for the 'greater good' ...

Well you get the idea.

If HT-style link quality evaluation were crossed against age and number of links, with exceptions for sites that meet certain other criteria, IMHO, you'd have the current landscape.

This 37 message thread spans 2 pages: 37