Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Panda's Patent - Implicit Links

         

JD_Toims

1:23 am on Apr 4, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I found one of the most interesting portions of Panda's Patent [patft.uspto.gov...] to be one I cited in another thread [webmasterworld.com...] and will recite here:

Emphasis Added
The system determines a count of independent links for the group (step 302). A link for a group of resources is an incoming link to a resource in the group, i.e., a link having a resource in the group as its target. Links for the group can include express links, implied links, or both. An express link, e.g., a hyperlink, is a link that is included in a source resource that a user can follow to navigate to a target resource. An implied link is a reference to a target resource, e.g., a citation to the target resource, which is included in a source resource but is not an express link to the target resource. Thus, a resource in the group can be the target of an implied link without a user being able to navigate to the resource by following the implied link.

We know [have been told repeatedly] nofollow links are "dropped from the link graph and are not even used for discovery", but the idea of implied links [I think better referred to as "connections" for the sake of clarity] leads me to quite a few questions and not many answers right now since we haven't heard much about the concept, so this is mostly a bit of "brainstorming" + "food for thought" and I'll leave this post at some questions I've been asking myself.



A Couple Things We Know:

Google doesn't like the idea of comment-spamming blogs with keywords or site names and they "reserve the right to take action" on those who do it -- Why? I would think if they really wanted to effectively stop it and it didn't count for anything in any way within their algo, rather than "reserving the right to take action", they would simply say, "If you would like to waste your time and money engaging in this activity, feel free, because it will have 0 effect on your results in Google.", and, imo, most people would likely stop.

Google doesn't use nofollow links for PageRank or discovery purposes; They're "dropped from the link graph." -- Okay, but what's classified as a "connection" [implied link] between two pages?



Example:

On example.com there's a page of text about "Does this Link Count for Anything in Any Way in Google's Algo?"

Referenced on the same page are:
<a href=http://www.example-a.com/nofollow-links-dont-count-in-any-way-for-pagerank-or-discovery-but rel=nofollow>Nofollow Links: Do They Count in Google Even If Not used for PR or Discovery Purposes?</a>

<a href=http://www.example-b.com/connections-that-count-in-some-way-in-google rel=nofollow>Example-B.Com: Connections Google Uses in Some Way for Rankings</a>

<a href=http://www.example-c.com/link-baiting-for-nofollow-links-aka-connections rel=nofollow>Ideas for Link Baiting for Nofollow Links [Connections Between Pages] on Example-C.Com</a>



Without "breaking the rules of nofollow" how could Google gain information/insight to improve their algo from the nofollowed links above and their text wrt the "connection" [implied links] between the page being parsed by the algo and the pages being referenced?

What do they need to use from the "connection" to do it?

Is there some type of a "degrees of separation" [for lack-of-a-better-way-of-expressing my thought] portion to the algo, meaning given everything they know how about pages, how many "degrees of separation" are there between Page-A from the rest of the web relative to "degrees of separation" between Page-B and the rest of the web?

Based on the preceding question, are the rankings somehow impacted by the page which is "the least separated" from the rest of the pages on the web?

goodroi

5:49 pm on Apr 4, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Before people respond to this, it might be helpful just to remind everyone that just because Google files a patent does not mean they will or are currently using the technology explained in the patent. If they are actively using the technology described in the patent they might have already refined and significantly adjusted how it works.

JD_Toims

8:10 pm on Apr 4, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's a good point, and it's really just a "food for thought" post, because there are a large number of "unknowns" and "untesteds" surrounding the idea, but I definitely think it's worth thinking about a bit and might even explain some of the things we've seen and "not quite understood" at times.

So, I'll stick with "plausible thought" for now.

Robert Charlton

8:52 pm on Apr 4, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



...just because Google files a patent does not mean they will or are currently using the technology explained in the patent.

I had much the same thought when I read this. The alternative types of links and citations in particular struck me as the kind of feature that a patent lawyer would include simply to cover the possibilities. That's what patent lawyers are paid to do.

In the spirit of "food for thought"... I believe that phrase-based indexing allows Google to look at what I'd loosely term as co-occurrence data, and to further classify that data by whether it's link anchor text, simply a citation, a brand or product mention, or article reference, or whatever... and that Google could weigh these reference by various valuations of the pages they're on. So references in blog comments, eg, would be weighted differently than those in in-depth articles. It's not just a question of counting references.

Put another way, I think there's probably a PageRank-like distribution aspect regarding the weighting of references, even when the references aren't actually links.

Whether this unlinked reference data is used, though, is another matter entirely. Google is constantly triangulating and re-triangulating, on a multi-dimensional level... and it's testing all the time. I'm sure, among other things, it's evaluating how much such data can be trusted.

With regard to Panda... I haven't absorbed how this "Panda" patent may be involved with the Panda that we've all come to know and love/hate (as the case may be). There's that whole area of seed sets and decision trees that I'm still thinking may have something to do with it... and, as I remember, that was another engineer named Panda. Maybe we've got a team of Pandas, not just one guy. Speculation only.

JD_Toims

12:16 am on Apr 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Whether this unlinked reference data is used, though, is another matter entirely.

Absolutely Agree -- AFAIK it's untested and a totally new concept to think "What's an implied link and how does it count?", rather than thinking <a href"">DoFollow</a> means a "vote" while <a href="" rel=nofollow>NoFollow</a> means "no vote", and they are the only things that count or don't count as far as links or mentions in Google's overall algo are concerned.



Probably the most interesting "food for thought" to me is how Google could *possibly* use something like "degrees of separation" between pages and the rest of the web for ranking purposes -- Meaning: explicit links + implied links [including nofollowed links in some way?] = Better results.



Personally, I can see how it *might* help them determine the "relative importance" of a page to show in the results for their visitors.

Example: A page on CNN.Com contains the same info as a page on MomAndPop.Com, but all other things being equal, there are more "implied links" [mentions/connections between external pages] for the CNN.Com page across the web, so the "degrees of separation" relative to all other pages on the web with the same info is less for the page on CNN.Com and it's therefore ranked higher in the results than the page on MomAndPop.Com is.



To me the whole concept of "implied links and how they *could* be used by Google for ranking purposes" is a really interesting question.

[Edited for Spacing Purposes]

7_Driver

10:49 am on Apr 5, 2014 (gmt 0)

10+ Year Member



Yes - Implied Links was a new idea to me - and I'm surprised there hasn't been more discussion of it.

It certainly seems odd that Google would want to count mentions of a brand which aren't even linked - yet not count NoFollowed links.

I wonder if it's simply to close off one obvious way of gaming the Panda algorithm - which would be to spam thousands of mentions of your brand across the web. That could increase the number of navigational searches for your brand, without any additional links - thus improving your Panda score.

By counting un-linked mentions as if they were links - that would no longer work - and in fact would become counter-productive.

I'm not hugely confident of that hypothesis - but otherwise I'm struggling to make sense of the whole "implied links" idea.

Planet13

2:51 pm on Apr 5, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"I wonder if it's simply to close off one obvious way of gaming the Panda algorithm - which would be to spam thousands of mentions of your brand across the web."

One thing to keep in mind is that google has so much access to SO MUCH data that they would be able to cross-reference citations with traffic patterns.

google has the data (collected from chrome and android, purchased from ISPs, and third-party browser companies such as mozilla and opera, or from various organizations that oversee linux distros, etc.,), that they can cross reference citations with actual traffic patterns to those sites.

I think that is really one of the major reasons for the brand love that google has. Yes, they could get their metrics from people typing in "amazon blue widgets" in to the google search bar, but it is probably just as easy for them to mine traffic patterns in the data they purchase from ISPs.