homepage Welcome to WebmasterWorld Guest from 54.166.14.218
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Trust and Authority - they are not the same thing
tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 11:39 am on Sep 27, 2008 (gmt 0)

While studying Google's recently granted Historical Data patent [patft.uspto.gov], I noticed that the language helps to separate two concepts that we tend to use casually at times: trust and authority.

...links may be weighted based on how much the documents containing the links are trusted (e.g., government documents can be given high trust). Links may also, or alternatively, be weighted based on how authoritative the documents containing the links are (e.g., authoritative documents may be determined in a manner similar to that described in U.S. Pat. No. 6,285,999) [patft.uspto.gov].

Clearly, Google has two different metrics going on. As you can see from the reference to Larry Page's original patent, authority in Google's terminology comes from backlinks. When lots of other websites link to your website, you become more and more of an authority.

But that isn't to say you've got trust. So what exactly is trust? Here's an interesting section from the same patent:

...search engine 125 may monitor one or a combination of the following factors: (1) the extent to and rate at which advertisements are presented or updated by a given document over time; (2) the quality of the advertisers (e.g., a document whose advertisements refer/link to documents known to search engine 125 over time to have relatively high traffic and trust, such as amazon.com, may be given relatively more weight than those documents whose advertisements refer to low traffic/untrustworthy documents, such as a #*$!ographic site);

So we've got two references here, government documents and high traffic! From other reading, I'm pretty sure that trust calculations work like this - at least in part. Google starts with a hand picked "seed list" of trusted domains. Then trust calculations can be made that flow from those domains through their links.

If a website has a direct link from a trust-seed document, that's the next best situation to being chosen as a seed document. Lots of trust flows from that link.

If a document is two clicks away from a seed document, that's pretty good and a decent amount of trust flows through - and so on. This is the essence of "trustrank" - a concept described in this paper by Stanford University and three Yahoo researchers [ilpubs.stanford.edu].

This approach to calculating trust has been refined by the original authors to include "negative seeds" - that is, sites that are known to exist for spamming purposes. The measurements are intended to identify artifically inflated PageRank scores. See this pdf document from Stanford: Link Spam Detection [dbpubs.stanford.edu]

To what degree Google follows this exact approach for calculating trust is unknown, but it's a good bet that they share the same basic ideas.

So let's all work to keep these two concepts distinct - trust and authority.

[edited by: tedster at 6:14 am (utc) on Dec. 3, 2008]

 

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 11:46 am on Sep 27, 2008 (gmt 0)

Here's an example of confusing trust with authority. It's been common to read statements such as "this site has Google Sitelinks, so it's a trusted authority." I hope my efforts here shows that no such thing is true.

Sitelinks are not at all a sign of trust, and a relationship to "authority" is possible, but it's also weak.

Sitelinks are a navigational aid that Google assigns to a domain for some queries if the site's architecture is clear enough for them to do so. If sitelinks are displayed for a generic query, and not just a domain name query, then something about authority has come into play - but still, that's not trust.

robzilla

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3753332 posted 11:22 am on Sep 28, 2008 (gmt 0)

Great post, tedster. Certainly something I want to keep in mind, so I flagged it.

Do you think trust is linked to topic, or would you say those are likely to be treated separately in an algorithm? Also, would a link from a high-trust source to a page also attribute to the total trustworthiness of a website? If so, I wonder if that would work based on internal linking (page1 gets a link from a trusted page on another website and links to page2, so page2 also gains trust), or perhaps on URL structure (domain.ext/dir1/page1.html is linked to from a trusted page on another website so all pages within dir1 get a bite of trust, and domain.ext will also get a (smaller) bite).

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 6:23 pm on Sep 28, 2008 (gmt 0)

Good questions. I should start by saying I'm not 100% certain of these answers and I'm making them to further the discussion.

I've always thought of trust as one of the "query-independent" factors, like PR - that is, it's not related to any topic. So that makes another key differentiating point. Authority is related to a topic, and trust is not.

I'd say that trust is usually a domain-wide factor, rather than being confined to a url. The papers often leave this open ended. The Link Spam Detection paper, for instance, discusses analyzing "nodes", where "nodes may be pages, hosts, or sites..."

Given the wide variations available across the web (think of the public blogging domains, for example) there must be some adjustments made for different kinds of hosting situations.

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 6:52 pm on Sep 28, 2008 (gmt 0)

I'd also look at trust as being query independent and authority as query dependent. In my limited understanding it just seems to make sense.

"nodes may be pages, hosts, or sites..."

That's one of the problems I have when trying to read and understand a Google patent since over the years we've always talked about G being "page focused". It sometimes hard for me then to think in terms of the above quote and the following from the determining document freshness [appft1.uspto.gov] companion to the Historical Data patent:

A document may include an e-mail, a web site, a file, a combination of files, one or more files...

It kind of opens up a whole new can of worms where we can envision Google going past PageRank and assigining "DomainRank" and other site-wide attributes.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 8:08 pm on Sep 28, 2008 (gmt 0)

over the years we've always talked about G being "page focused"]

Yes, we have - especially when it comes to the definition of PageRank. However, several Google folk have talked about there being more domain-wide factors being integrated into their algo recently. The details are pretty much in the area of "secret sauce", but if they say they have done it, I'm sure that they have - and trust would be one obvious area.

Authority is possibly another. I've been trying to outrank one particular page on a subdomain for several years, and the only reason I can see for its dominance are authority factors related to the domain itself, not even the subdomain.

wheel

WebmasterWorld Senior Member wheel us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 10:18 pm on Sep 28, 2008 (gmt 0)

this is similiar to the way I approach my conceptualization of the Google algo. I tend to look at it from two aspects; trust and relevance. trust, as you've described - similar to pagerank but . Relevance comes from having backlinks from sites that have similiar content who also have backlinks from relevant sites.

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 10:20 pm on Sep 28, 2008 (gmt 0)

Oh, I have no doubt that some of those domain-wide factors are at play. It's reading the patents and apps and realizing what the definitions of nodes and documents are that really open my eyes to what might actually be at play in the "secret sauce" side of the algo.

Authority is possibly another. I've been trying to outrank one particular page on a subdomain for several years, and the only reason I can see for its dominance are authority factors related to the domain itself, not even the subdomain.

Yeah, these are the ones that make you think. We know that it's just not a matter of links, but of the quality of the links. I've been convinced for several years that it's not just the quality of *the link*, but the quality of the *links going back several generations*. And the quality of the link can affect both trust and authority.

Relevance comes from having backlinks from sites that have similiar content who also have backlinks from relevant sites.

Missed this as I was posting, but Yowza! Yes, take relevancy back a couple of generations and it sometimes answers the "How the heck is this page ranking?" question. I think we first started seeing this -- or first started to articulate it -- back in 2002 or so. Some folks have mentioned it here and there, but not many people seemed to want to take notice. It's been a well-ignored secret for some time.

Ikinek



 
Msg#: 3753332 posted 11:16 pm on Sep 28, 2008 (gmt 0)

What are they a sign of?

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 11:19 pm on Sep 28, 2008 (gmt 0)

sites that have similiar content who also have backlinks from relevant sites

This borders on another territory that Google ventured into in 2002 - LocalRank. With LocalRank, the preliminary result set is then analyzed by restricting link juice to just those domains within those results. Then a re-ranking is performed over that preliminary set, based on the LocalRank score.

After six years, I'm sure that the straight-up localrank method has been assimilated into other routines, but the essential principle would remain - and be seen in authority calculations.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 11:27 pm on Sep 28, 2008 (gmt 0)

What are they a sign of?

I mentioned something about that in the second post - they're a navigational aid for the end user. Google assigns Sitelinks algorithmically for some queries, based on traffic, website architecture, and some other "special sauce" ingredients. Sitelinks began at first for only the strongest domains, and that may have caused some confusion. As they became widespread, it became clear that having Sitelinks for the basic query [domain.com] was not all that special any more.

Now, if you see Sitelinks for a generic keyword query, rather than a full domain "navigational query", then that might show something about authority - but it still isn't about trust.

anallawalla

WebmasterWorld Administrator anallawalla us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 4:11 am on Sep 29, 2008 (gmt 0)

Tedster:
I'm pretty sure that trust calculations work like this - at least in part. Google starts with a hand picked "seed list" of trusted domains. Then trust calculations can be made that flow from those domains through their links.

I've always thought of trust as one of the "query-independent" factors, like PR - that is, it's not related to any topic. So that makes another key differentiating point. Authority is related to a topic, and trust is not.

(spliced from two posts)

So the challenge is to make one's site seed-worthy. Can we assume that the Google Directory has high trust, being on the google.com domain? If so, every entry in DMOZ has inherited a good deal of trust.

However, when you look at a large corporate website and its links to a subsidiary, I think you'll find that the latter has more seed-worthiness than the DMOZ entry that was blessed by Google's trust. This suggests that trustrank distribution is similar to the ways link juice is diluted as you place more links on a page. This is asserted by the following learned paper:

Propagating Trust and Distrust to Demote Web Spam [ftp.informatik.rwth-aachen.de]

How might seed shortlists be created?

Domain ownership would play a strong part, hence whois data washed against stock exchange market cap records would easily separate the mega corporations from the IPOs. Similar comparisons from academic sources would separate the top shelf .edus from the dubious minor .edu institutions. Ditto for .gov sites.

I'd say that trust is usually a domain-wide factor, rather than being confined to a url. The papers often leave this open ended. The Link Spam Detection paper, for instance, discusses analyzing "nodes", where "nodes may be pages, hosts, or sites..."

The authors of the above paper have another view. They say:
The trust score of a page is an indication of how trustworthy
the page is on the Web.
Then they introduce the concept of DisTrust and BadRank. :) I don't know if any of them have since moved to Google, so their paper may simply be a paper.

Marcia

WebmasterWorld Senior Member marcia us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 4:26 am on Sep 29, 2008 (gmt 0)

There's a section in this document on Link Analysis (PDF) [nlp.stanford.edu] that gives a good explanation of hubs and authorities.

And here's Jon Kleinberg's classic publication, also PDF:

Authoritative Sources in a Hyperlinked Environment [cs.cornell.edu]

Hub: Links out to a lot of authoritative documents
Authority site: Linked to by a lot of authoritative documents

An "authority site" is what's often used to describe a site with "authoritative information" but that's not the same sense as how it's used in search.

Now, if you see Sitelinks for a generic keyword query, rather than a full domain "navigational query", then that might show something about authority - but it still isn't about trust.

I believe site links can be based on either hub or authority score for a site, given that the site content and navigation support the topic of the search term.

[edited by: Marcia at 4:30 am (utc) on Sep. 29, 2008]

CainIV

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3753332 posted 7:06 am on Sep 29, 2008 (gmt 0)


If a website has a direct link from a trust-seed document

Has anyone ever defined such a list of documents?

glengara

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3753332 posted 9:40 am on Sep 29, 2008 (gmt 0)

I've always assumed "trust" derives from a site's internal/outgoing links which is why I'm a bit of a on-page-topic Fascist in relation to outgoing links......

brotherhood of LAN

WebmasterWorld Administrator brotherhood_of_lan us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3753332 posted 10:14 am on Sep 29, 2008 (gmt 0)

Great topic. I'd wonder how much of trust assignation is hand-applied/automated.

Marcia

WebmasterWorld Senior Member marcia us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 10:34 am on Sep 29, 2008 (gmt 0)

>>outgoing links...

How about pages found to be selling links and losing the ability to pass PR? Wouldn't that be related to a negative trust factor being applied to the sites selling links, at least for the links in question.

But the point is, does that affect nullifying the benefit of just those links in particular, or all the links on the pages selling them, including internal links and editorial links not being paid for?

In other words, would that affect all the links on the page, or just those in certain segments of the page (visual page segmentation)? Wouldn't this type of PR metric be related to trust - or losing trust? And to what extent for the page or the site overall?

[edited by: Marcia at 11:06 am (utc) on Sep. 29, 2008]

netmeg

WebmasterWorld Senior Member netmeg us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 3753332 posted 2:18 pm on Sep 29, 2008 (gmt 0)

This is a way cool and interesting topic, tedster, thanks for starting it.

I've seen mentions of trust factor as regards pages (urls) and domains - but what about servers, or hosts?

I am thinking of, for example, those companies that offer "insta pages" that are just one level above a parked domain, with generic RSS article feeds and maybe a picture or two - wouldn't you think that after a few thousand of those are generated with the same content over and over, the entire network would somehow gain a negative trust factor, regardless of domains?

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 6:34 pm on Sep 29, 2008 (gmt 0)

How about pages found to be selling links and losing the ability to pass PR? Wouldn't that be related to a negative trust factor being applied to the sites selling links, at least for the links in question.

I'd say yes. Notice that the quote that I put in the first post taslks about monitoring ads on the page to determine trust?

But the point is, does that affect nullifying the benefit of just those links in particular, or all the links on the pages selling them, including internal links and editorial links not being paid for?

One of the things Google did to whack paid links was to hit the PageRank. We also know that back in January, Google changed "something" about the PageRank formula. Do you think PR now includes a trust component?

what about servers, or hosts?

I've got a strong feeling that servers and hosts can be wrapped into the formula for trust, but on a case by case basis - and this may also have a manual review component if a host is flagged by the algo as looking dicey. The Yahoo paper on Link Spam Detection that I mentioned above talks about nodes, where ""nodes may be pages, hosts, or sites..." Of course, that might mean "hostnames" as in subdomains - and there is no technical definition for "page" which is a very fuzzy concept.

Has anyone ever defined such a list of [seed] documents?

I can only wish. Many people assume that DMOZ is one, and .gov domains are another. The quote from the patent mentions Amazon, so that would be a third, and I'd guess that most of the .int domains are a fourth area for trust seeds. I'm sure the list is a lot bigger than the few I just ticked off.

I can't imagine how we might reverse engineer that list, since Google's trust metrics are not published anywhere, and are not likely to be. So we can only guess, and since even major newspapers have been caught up in the link-selling witch hunt, some of my previous assumptions about trusted sites are pretty much out the window.

whitenight

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3753332 posted 7:08 pm on Sep 29, 2008 (gmt 0)

Many people assume that DMOZ is one, and .gov domains are another. The quote from the patent mentions Amazon, so that would be a third, and I'd guess that most of the .int domains are a fourth area for trust seeds

In the past, i've tested this theory.
I've gotten links from dmoz, amazon, "supertrustedsite.com", etc. and haven't seen any appreciable improvements to the only thing that matters (to me) in this discussion (rankings).

There are some fringe benefits, but those don't pertain to this discussion.
Basically, I think it's overblown.
You can be "untrusted" which is bad, but being "trusted" doesn't seem to play a major role.

"Authority" is a whole 'nother discussion with lots of ranking benefits.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 7:16 pm on Sep 29, 2008 (gmt 0)

I basically agree - authority matters much more for rankings than trust. One case where trust might be a benefit is if someone starts a Google-bowling type of campaign against your domain. Then having a high trust level may help you hold on to your rankings.

robzilla

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3753332 posted 8:37 pm on Sep 29, 2008 (gmt 0)

I've got a strong feeling that servers and hosts can be wrapped into the formula for trust, but on a case by case basis - and this may also have a manual review component if a host is flagged by the algo as looking dicey.

That could open up a whole new can of worms as servers are often re-sold or -rented. Since there's no way to check whether a domain, host or server is trusted, who knows if the next server you get has previously been abused by spammers and (manually or automatically) marked as untrustworthy.

I basically agree - authority matters much more for rankings than trust. One case where trust might be a benefit is if someone starts a Google-bowling type of campaign against your domain. Then having a high trust level may help you hold on to your rankings.

Right, it's good to keep in mind that its influence may be limited, or even variable. Nonetheless, it's probably yet another part of the puzzle that makes up an algorithm, so its existence and influence, limited as it may be, should not be ignored.

CainIV

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3753332 posted 9:08 pm on Sep 29, 2008 (gmt 0)

Reputable hosting. Item # 7 in the successful SEO for modern business handbook :)

anallawalla

WebmasterWorld Administrator anallawalla us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 10:32 pm on Sep 29, 2008 (gmt 0)

Has anyone ever defined such a list of [seed] documents?

I can only wish. Many people assume that DMOZ is one, and .gov domains are another. The quote from the patent mentions Amazon, so that would be a third, and I'd guess that most of the .int domains are a fourth area for trust seeds. I'm sure the list is a lot bigger than the few I just ticked off.


That's why I mentioned the Google Directory rather than DMOZ.org to suggest that Google's own domain has got to have some trust. Yet each of us could name a few dodgy sites that should never have been in DMOZ, hence the inheritation of trust is an equally interesting onion to peel. I contrast it to the corporate parent to subsidiary case where trust appears to flow more readily within the corporate web farm.

While there would be some hand-selected seed documents, it would take a month of Sundays to compile seed documents covering all possible topics for all trusted domains.

I believe that there has be a second level of algo-selected shortlisted sites that need a quick human confirmation, but are considered to be seed quality until the human review. For example, .edus used to have blanket trust but now all folders (for example) beginning with a tilde might be assigned less trust. This is why I suspect that trust needs to be at a folder or page level and not for a domain.

potentialgeek

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3753332 posted 6:48 am on Oct 1, 2008 (gmt 0)

> . . . since Google's trust metrics are not published anywhere, and are not likely to be. So we can only guess...

Tedster, if you were lucky enough to write Google Trust code, what would you write?

To the issue of trust and hosting, I can't help thinking Google uses the old anti-spam email method used years ago where blocks of foreign IP addresses were flagged as untrustworthy.

IF site A appears to be spam AND it is hosted on servers previously identified as being used to spam THEN lower trust rank.

Other hosting trust flags could be frequent server changes; frequently slow-loading home page/other pages; hosting for excessively interlinked sites.

How many other hosting trust issues could there be?

p/g

P.S. Lately I've been wondering if there's a positive flip side to hosting where hosts with good records can actually help your trust and/or ranking (a little). Which seems reasonable.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3753332 posted 3:49 pm on Oct 1, 2008 (gmt 0)

I want to start by clarifying one thing. Technically, I'd say that trust means "are you trying to manipulate" not "is your information accurate."

If I were writing the trust algorithm (fat chance!) I would also start with seed domains, and review that list at least monthly. Then I'd go for a continually updated trust value, something like PageRank.

I'd make sure that small amounts of trust get deducted for various infractions, especially for dicey outbound links. Not for 404's (hey, link rot happens) but for links to neighborhoods with low/no trust values. After the issue is fixed I'd also restore that lost trust in increments and not all at once. Repeated problems would result in a longer term loss of some trust.

Those deductions would happen no matter what value the initial calculation from the seed domains showed. I'm not sure whether those deductions should cascade outward to the legitimate external links - probably not, but I'd need to study the data first before making that decision.

I'd also consider advertising links on the site as part of the trust calculation - even if they were rel="nofollow". Google's "historical data" patent already mentions this.

A more challenging issue would then be how to integrate trust into the overall ranking algorithm. Some urls just need to be in some results because the end users will expect them. So trust deductions would still not affect "Company Name" searches much if at all, whereas they certainly should affect "big keyword" searches, no matter how well known the enterprise.

glengara

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3753332 posted 4:14 pm on Oct 1, 2008 (gmt 0)

"...authority matters much more for rankings than trust"

I take your point, but I've seen some impressive results for pages with just internal links from within a "trusted" site, and I doubt it would have got them had the linkage not remained squeaky clean ab initio.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved