Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: goodroi
Inventors: Krishna Bharat [searchwell.com]
Assignee: Google, Inc.
A re-ranking component in the search engine then refines the initially returned document rankings so that documents that are frequently cited in the initial set of relevant documents are preferred over documents that are less frequently cited within the initial set.
Assumption: "Initial set of documents" = top search results
If a search returns 100 results, sites 11-100 will be re-ranked. Sites that contain inbound links from sites 1-10 will receive a higher ranking than those sites which do not, all else equal.
Assumption: "Initial set of documents" = top search results
I believe this patent describes "Local PageRank", where 'local' means 'local to a particular search query'. Let's say that a search for "widgets" returns 1,000 results, based today's algo. Now let's pretend that those 1,000 sites represent a "mini-index", so backlinks only count from sites within the 1,000 site mini-index. So a search-query-specific (local) PageRank is calculated, and the enitre list is re-ranked.
The implication of this is that it would be really important "who" your backlinks are. If your backlinks are returned with the same search terms as your site, it will improve your ranking. This provides an incentive for seeking quality, relevant links. Links from completely irrelevant sites would be weighted lower, and so they should.
Don't you mean different IP ranges (classes or whatever)? Can a bot tell two sites are from different servers if the IP's are completely different? Like instead of 211.111.111.001 and 211.111.111.002, to have 211.111.111.001 and 3126.96.36.1992 or something, if you know what I mean?
This is a quick comparison of my understanding of these methods, I post it in the hope of having an errors corrected:
PageRank calculates the importance (but not context) of pages by looking at who links to whom, recursively. Google uses PageRank with on page factors and link text to order documents.
Hilltop gets the initial search results and selects those with at least some threshold of external links as 'experts'. It cleverly selects links from the expert pages that are qualified by on-page factors and uses those links (and maybe other information) to order the results.
Kleinberg gets the initial search results and adds pages linked to and from those results. These are used to find hubs and authorities, respectively, by starting with some initial set of values and iterating the flow of forward and back links to and from authorities and hubs, respectively (a principal eigenvector like PageRank).
If I read this patent correctly (I agree with swerve's description), then it gets the initial search results and calculates the simple link popularity of each member of the set, from other members of the set. This 'local' link popularity can be used to re-order the initial results.
mfishy, people who create multiple sites for the purposes of linking are more likely to have ensured separate IP addresses (or class C ranges) than experts who happen to use the same provider (which is likely for some academic and geographical topics). The thing that worries me about these methods is the requirement for non-affiliated sources; smaller initial document sets would be easier to dent IMO.
So a search-query-specific (local) PageRank is calculated, and the enitre list is re-ranked.
Normally Google calculates PR once in a month during the index update... This patent requires to calculate Mini query specific PR dynamically...I dont know how feasible its computationally...
Even if implemented this will very much reduce the speed in which google spits out results ( pls note Speed is one of google's strong points apart from search relevence)
Google is definitely not going to implement any kind of query specific on-the-fly PR calculations. IMO, the patent is nothing we should bother about..
Did not read the whole patent yet, but does not have to be on-the-fly.
Google can take the top 20.000 (Zeitgeist) single and double word search queries and do it before-hand.
This stuff gets uninteresting once search queries become three word plus and more specific IMO.
Why not? Researchers at both Stanford and Google have been looking for easy ways to apply this method for the past couple years. It is definitely something they show interest in.
Unlike HITS,  suggested that importance scores be precomputed offline for every possible text query, but the enormous number of possibilities makes this approach difficult to scale.
Slightly off topic, but on the page about krishna bharat why does it say SM as opposed to Tm next to the google logo, am i missing something or just plain thick?
IANAL, but I think SM just means service mark- it's like a trademark but used to distinguish a service (as opposed to a product) from its competitors.
At least that's how I see it based upon what's there.
They can take a list of say 100,000 Top kw's from their log and also another 10,000 hyper competitive ( which means normally spammy ) kw's ( eg: dietpills/credit cards/casinos ) and precalcultate query specific PR for those kw's offline .
Finding ultra competitive kw's is not at all a problem , they can just use their adword data ...just grab the top 10,000 high priced kw's :) ....this also scales well
And for the remaining they can just use the traditional algo ... A benefitial side effect of this strategy for google is it will confuse the SEO's also :) ...you dont know which algo google use for which kw's :)
12. A system comprising:
a server connected to a network, the server receiving search queries from users via the network, the server including:
at least one processor;
a database of a corpus; and
a memory operatively coupled to the processor, the memory storing program instructions that when executed by the processor, cause the processor to: generate an initial list of relevant documents from the corpus based on a matching of terms in the search query to the corpus, rank the generated list of documents to obtain a relevance score value for each document in the generated list of documents, calculate a local score value for the documents in the generated list, the local score value quantifying an amount that the documents are referenced by other documents in the generated list of documents, and refine the relevance score values for the documents in the generated list based on the local score values.
Can someone clear this up?
- Is different IPs (with different C class) enough?
- Can Google, would Google know/care if sites are on the same server?
- Would Whois data play a role here? e.g. if sites have (or had) same ownership/administration.
kapow, the answer to those questions would be Google's choice at the time of implementing such a system.
it would not count links form the same servers so people will have to get different hosts.
Clearing it up: This whole thing devalues doorways in general. In order for your doorway to pass LR (Local Rank) it has to rank on its own. If it ranks on its own, it must have PR and have other pages linking to it. Since the only way to get PR to the page quickly (without going on a linking campaign) is to link to it from your site. In order to link to it from your sites in several different areas, you trigger the "affiliate host" flag (see sections 2 and 3 of the patent). IP is irrelevant. I'd imagine that flag is pretty broad - any hint of cross referencing within that term is going to trigger it. Afterall, if the page IS really good, then there will be plenty of other pages to give it its LR bonus.
Step 1: Google ranks like it always has.
Step 2: Google calculates LR from within those results
Step 3: Google resorts and gives you your results.
Based upon what I can see (there IS a formula missing in that patent page right now, so you can't tell for sure). The LR calculation plays equal importance to ranking (at least in the top X results) as all the other factors combined - but you still need all those other factors to get into the results in the first place.
IMO, the patent is in no way related to Google's ranking techniques. Bharat is the inventor. He has been working on a lot of different things. Google is simply assignee because Bharat works for Google and in his contract with Google there probably is a paragraph which states that all his inventions belong to Google. (I don't know how all this works in the US.) Google being assignee doesn't mean anything. Let's not create any new myths...
Filed: January 30, 2001
Isn't this pretty old stuff?