| This 37 message thread spans 2 pages: 37 (  2 ) > > || |
|Hilltop algo: PROVED? ...or just our best guess for now?|
There's been a lot of conjecture and best guessing about the Hilltop algorithm from George A. Mihaila and Krishna Bharat, and whether Google incorporated some version of this in their major hiccup last November. At that time Google was talking about semantics, not Hilltop.
One of the key Hilltop concepts was identifying "affilated" websites - sites whose IP addresses shared the same c-block, or sites with the same right-most token in the domain name when the TLD is stripped (orgname.org and orgname.com would be considered affiliated and links between them devalued.)
Has anyone seen actual evidence that one or both of these "affiliated" determinations are actually in effect?
If Site-A links to Site-B and Site-B links to Site-C (and especially if they both are both linked to from Site-D and link back to Site-D) then Site-A and Site-C are affiliated. I have seen that.
Though there are too many other factors involved for it to be conclusive, there had better be some non-affiliated links. And if not from expert sites, then from sites that have links from expert sites. The latter is seldom if ever mentioned, but it's worth including in the arsenal of tools.
|The affiliation relation is transitive: if A and B are affiliated and B and C are affiliated then we take A and C to be affiliated even if there is no direct evidence of the fact. In practice some non-affiliated hosts may be classified as affiliated, but that is acceptable since this relation is intended to be conservative. |
It's a best guess, but it's also worth throwing in a little guesswork that taking last year's Local Rank patent into consideration might not be a bad idea, since that isn't totally unrelated to the concept of links from expert sites being important.
How do you determine whether certain links are being discounted? Would they still show up as backlinks?
I have noticed that many of my sites with DMOZ/Google directory links show all the low PR DMOZ clones in the backlinks, but not the two.
|One of the key Hilltop concepts was identifying "affilated" websites - sites whose IP addresses shared the same c-block |
Wouldn't this then have to include a large population using shared hosting? That strikes me wrong.
"Additionally, in computing the level of relevance, we require a match between the query and the text on the expert page which qualifies the hyperlink being considered. This ensures that hyperlinks being considered are on the query topic."
So if an expert links to you, in order for the link to be relevant, both sites have to be on topic and relevant to the search term. Would that be an appropriate interpretation? So a link is no longer a link. Sites linking to you need to be on topic in order to boost your rankings?
>Sites linking to you need to be on topic in order to boost your rankings?
I think this is true to some extent. it may boost your PR, but not rankings. Helps fight PR/link selling as well.
|Wouldn't this then have to include a large population using shared hosting? That strikes me wrong. |
How often would someone sharing hosting with you also link to you, unless there really was some kind of "affiliation"? Pretty rare, I think.
|it may boost your PR, but not rankings |
But even now, anyone can get an off topic PR8 link which boosts your PR, but not your rankings. As far as I can tell, relevancy doesn't have much to do with PR. Relevancy may have more to do with your rankings though. At least that's what I take from reading the hilltop paper over and over.
|How often would someone sharing hosting with you also link to you |
It is common for a home town web designer to link to all his clients to show his 'resume'. And he probably also put a 'site designed by' link on each of his clients pages.
And all the people in the home town use the same ISP, and Bobby Joe links To Betty Sues page because they go to the same school.
Im sure there are more. It was a very common practice long before there even was a google.
kwasher, I think all those situations you describe really ARE an affiliation between the sites, in Hilltop terms. That's explicitly what Hilltop wants to catch, and thus isolate the really good links that are not likely to have been set up by one person/organization.
However, please let me focus this thread again.
We have many threads that discuss the possibility of Google using Hilltop, or at least parts of it. But has anyone seen solid evidence that Google really is using these criteria?
I can't even concieve of how you could test the right-hand token factor unless you moved an entire "same token" website to a new "different token" domain without changing any of the content.
To get evidence about the IP factor, I would think you need to move a well-linked domain from a shared C-block to a different C-block with no other changes -- and then seen a jump in rank.
Something like that, at any rate. And that all assumes that no historical record is bing tapped.
|How often would someone sharing hosting with you also link to you, unless there really was some kind of "affiliation"? Pretty rare, I think. |
There are times when affiliated recommendations would be useful. Imagine a hosting provider for SomeTown. Businesses and clubs in SomeTown might link to local hotels, for example.
Bias is certainly a potential (they might link to their friends, rather than the best hotels, but if they happen to use the same local provider then affliliation-based link degradation could devalue links from the local topic-experts.
Bio4ce, I agree that PageRank is not a relevancy tool. Link theming would more likely apply to an additional bonus for relevant links (of which Hilltop is one method). I don't think that people are suggesting that sites linking to you need to be on topic in order to boost your rankings, but it could be a factor in Google at some point.
There are significant downsides to this though. Firstly, a link from an unrelated site may be a more thoughtful, unbiased recommendation than a link from a related site.
This may be especially true for commercial sites. Do widget shops give the best recommendations for widget shops, or would sites in a different theme be more likely to recommend the best widget seller.
Secondly, a link-theming engine might be very easy to game. People constructing link networks to artificially inflate rankings have been looking to split across different IPs (and class C ranges) and whois information for a long time. Someone who's put up a bunch of widget sites for the purpose of getting to the top in Google would get a bonus, potentially crowding out more naturally 'organic' sites on the topic.
So the questions in my mind are would Hilltop-style theming and affiliation-based link degradation penalise "very common practice long before there even was a google" as kwasher puts it, and whether it would give a bonus to some types of artificial link networks.
|anyone seen solid evidence that Google really is using these criteria |
A lot of people believe that anchor text from similar-IPs counts for less (note anchor text benefit, not PR).
I don't know anyone who's found link-theming in Google using controlled-conditions test.
|So the questions in my mind are would Hilltop-style theming and affiliation-based link degradation penalise "very common practice long before there even was a google" as kwasher puts it, and whether it would give a bonus to some types of artificial link networks. |
Absolutely, positively, beyond a shadow of a doubt. There's an army of gifted, capable web designers out there who don't have the slightest clue about SEO. I can tell you for a fact that some don't even have the slightest idea what the difference is between SEO and link pop and getting a link to a site for doing PPC. I have the emails to prove it, it took about 3 hours back and forth this week trying to explain the difference to a group between SEO and PPC.
They just make gorgeous sites, find the best host they can to put all their client sites on, and use that host because they're familiar with the interface and trust the service.
Since they're technically all "affiliated" because of IP number and C class, they could conceivably be, though honestly and legitimately linking and as relevant as can be, at a distinct disadvantage when it comes to competing against a non-designer SEO who knows enough to spread their sites around, knows just how to manipulate the linking and where to buy the PR - and could potentially beat the tar out of the ones who are legitimately relevant for that type of search query.
We have seen no tangible proof that links are disregarded for affiliation, we can only guess. But more importantly, is there something operating where there can be a boost for being linked to from expert site(s), and is the IP/C-class no more than one of the factors to consider when determining what actually qualifies a link as being from an "expert" site.
What I have seen, which I think is very telling, is that when a site (site-A) that has an authoritative "expert" link and description that confirms its relevance for one of the keyphrases in a search, by that site linking to another site (site-B) which targets that particular keyphrase, that link from the site that was linked to from the expert site (site-A) can apparently act as a conduit of verification for the linked-to site's (site-B) relevance for the phrase and give a very valuable boost to its rankings.
>>I don't know anyone who's found link-theming in Google using controlled-conditions test.
Definitely not controlled-conditions testing, but I'll take it where I can get it when it helps. We had a long thread here in this forum back during the Florida debacle specificallyb about how web design sites were hit that was very revealing and very much related to the "affiliation" issue - and if anyone took the time to read that through (and compare notes with a few of the people involved), they'd be closer to being a believer in the value of themed links - particularly when it comes to ranking for locale-specific searches.
We also had one on real estate/local sites, and I can personally testify that I saw one site that didn't get touched even for one second because of, in addition to a couple other factors, a theme-based link from an optimized page on the right site done the right way.
Still not proof; it's a guess based on anecdotal evidence, but viable enough to run with.
that is, as always, a definitive post and just to underline the point I built myself 5 websites and got numerous #1 Google matches for my keywords and I did not even know page rank existed!
All of my sites link to each other, as any sensible non internet business would cross promote. The links are only where appropriate and I have suffered no penalty.
In fact to go further I have virtually no backlinks from sites beyond my control (except dmoz etc)and I rank very well indeed.
I think we can still rely on the principal of
"build your site for visitors not search engines"
If memory serves, the main factor in both Hilltop and LocalRank in determining "affiliation" was that the pages would be returned for a specific search query.
So links from similar class/IP pages that were not returned for that search query would not be included in the "affiliation" test.
Mind you, now that G has introduced the concept of removal from the calculations through "affiliation", that narrow interpretation may well widen....
|...and description that confirms its relevance for one of the keyphrases in a search... |
Are you referring to a description near the link (to site A) on the expert site or?
The affiliated sites test that was described in the Hilltop paper seems to be one used to determine the 'expert' sites not the target sites that are being ranked.
'The targets we identity are those that are linked to by at least two non-affilitated expert pages on the topic'.
'Considering all pages with out-degree greater than a threshold, k(e.g., k=5) we test to see if these URLs point to k distinct non-affiliated hosts. Every such page is considered an expert page.'
So as long as the 'expert' has a resonable number of NA links on it, it can be an expert for all sites including affiliated sites.
Of course Google may have made extensive modifications of the orignal algorithm.
One severe limitiation of the orginal Hilltop is that it can really only apply to very broad matches. In order to rank on a given query for a link, all the query terms must be on the expert page: 'For an expert to be useful in response to a query, the minimum requirement is that there is at least one URL wich contains all the query keywords in the key phrases that qualify it.' The key phrases are defined as the Title, Heading and the URL's anchor text on the expert page although later in the paper it says 'A fast approximation is to require all query keywords to occur in the document.'
By either definition a page in DMOZ is not going to match very many 3 or 4 word queries, so Hilltop is probably best for the 1 and 2 word queries as used in their examples.
On the "affiliation" topic - (and maybe this is a stupid question) - do you think google actually looks at things in such detail?
If site A links to 200 site B's, and 100 of these link to site C, is A seen as an affiliate of site C?
If so, are the 100 links to site C counted as only one link in determining ranking?
*do you think google actually looks at things in such detail?*
That depends whether you believe that over a hundred factors are looked at to determine a pages' ranking, or you think that's a load of codswollop, and there are basically only two.
I believe the former, but suspect the latter ;-)
I would be inclined to agree with the idea that there are lots of factors, but not so many important ones.
In that case you shouldn't discount a close look taken at the page/site linking pattern....
so the popular scenario of a hosting company that hosts all itst 100+ sites on one server, then has all its clients sites have footer links "sites designed by ..." would be in a precarious situation, especially if the website for the design company had a portfolio page of its clients, with links pointing to them.
I used one hosting service because a friend ( and competitor) told me how good they were. We chase the same or similar search terms.
Is it likely that one or both of us are now effectively penalised because like him I chose a competant host?
According to HT and LR, " affiliation" doesn't bring any sort of penalty, it's just an exclusion from the calculations.
Having said that, IMO, the whole "affiliation" thing is still very much speculation on possible future trends.
On the other hand, ;-) and totally circumstantially, I have noted that one thing in common many of the people seeking advice for "unexpected" drops in rankings have, is cross/inter linking.
Many have kept their PR and IBLs, it just appears as if they're not working...
Thanks to technology like load balancers and firewalls that support NAT, there are many hosts who have THOUSANDS of sites hosted on a single IP address.
I don't see how any search engine could actually justify imposing IP-based penalties. The odds of unrelated companies being affected are just too great.
|Watcher of the Skies|
I'd like to repeat Netnerd's question in the hope that Marcia sees it and weighs in.
"If site A links to 200 site B's, and 100 of these link to site C, is A seen as an affiliate of site C?"
Plus a couple of qualifications:
1.) A's, B's and C's all on different hosts.
2.) Most/all incoming links to C come from (please follow me here) Site B's that were linked to from Site A's. But, A's and B's would have MANY other varied incoming and outgoing links.
IYHO, would all this be an issue? WOULD B's LINKS TO C BE DISCOUNTED? This could be a reasonable way to promote a new site that has no PR yet which would not be useful yet in providing a reciprocal link. Thanks!
... and then I have to wonder about 'private registrations'. Site who uses private domain name registration all use the whois info of the private registration service.
>>So links from similar class/IP pages that were not returned for that search query would not be included in the "affiliation" test.
Hilltop and Local Rank rely on re-ranking just a subset of relevant pages, based on the interconnectivity of just these pages that are all relevant to the query. If you allowed more than one page from an organisation to get into this subset and influence the re-ranking, that organisation would have too much of a big advantage (way beyond pre-Florida).
Also according to the Hilltop paper, 2 sites do not need to link together to be in affiliation. If you're on the same IP class C as another site that was also relevant to the query, only one will be listed (or at least the way I read it?). Guess the way G sees it, there's more chance of 2 pages on the same IP and topic being affiliated than not.
I'm not convinced that anyone has managed to identify enough unique features that are completely exclusive to Hilltop or LocalRank, to be certain that one or the other is in place.
Personally, even if they were using Hilltop, it's PANTS!
I understand, that finding a set of non-affiliated human edited pages, on a topic to determine relevancy can improve accuracy, but to then use a computer to identify these expert pages? hmmm!
|there are many hosts who have THOUSANDS of sites hosted on a single IP address. I don't see how any search engine could actually justify imposing IP-based penalties. |
Not penalties, I'd say - just less of a positive affect.
I mentioned this earlier, but my wording may have been a bit foggy. Even with 1,000's of sites sharing an IP Address, a false positive would only be generated if one of them linked to another and they had no real-world affiliation.
How common would that really be? Much more likely that there IS some affiliation, such as a common designer, etc. And if the only outcome is that the link counts "less" in some way, then it would be a very minor thing, right?
|Not penalties, I'd say - just less of a positive affect. |
You don't consider that a penalty? If hosting with Web Host A puts you in Slot 1, and hosting with Web Host B puts you in Slot 2, how is that not a penalty when you host with Web Host B? The "How common would that be" response doesn't fly with me. Slim chance or not, it is still bound to happen and the people it happens to may not even know (or may not be able to figure out why). And even if they could figure out why, why should they have to switch hosts or stop linking to another resource just to stay in the search engine?
In the real world, shared IP addresses are a reality. Hosting acquisitions happen every day. Advances in hardware and software allow hosts to place thousands of sites on a single machine whereas a few years ago they could only place 300. Load balancing solutions allow literally tens of thousands of sites to be hosted from a single or small group of IP addresses across a cluster (and if you think that some of the large hosts out there aren't doing this, think again).
Sorry, but making any position decisions based on the IP address, in a world where the IP address is becoming less and less unique, is unacceptable. If people are going to abuse the system, they're going to find a way to do it - what would stop me from getting a bunch of $1.99 a month hosting accounts with different ISP's and accomplishing the same thing? NOTHING. The guilty will simply change the way they're doing things. The INNOCENT shouldn't have to.
Personally I would like to see the entire ranking method based on links go away completely, but since that seems to be here to stay at least do it with some common sense. IP address restrictions isn't it.
From the Hilltop paper:
|In practice some non-affiliated hosts may be classified as affiliated, but that is acceptable since this relation is intended to be conservative. |
What Google will continue to pursue are programmatic ways to determine pages that freely and objectively link to other pages - with no sort of tit-for-tat. No collaboration, payment, reciprocal agreements, shared ownerships, etc.
If Google can do this with a high degree of accuracy, then they will. The false positives for affiliation should be rare and ultimately negligible in their effect on the final SERP. But of course, if we are doing SEO, then we don't like this very much because we WANT to influence the SERP in our own favor.
[edited by: tedster at 8:39 pm (utc) on July 26, 2004]
| This 37 message thread spans 2 pages: 37 (  2 ) > > |