Forum Moderators: open
Published 9/19/02
"Methods and apparatus for providing search results in response to an ambiguous search query [appft1.uspto.gov]
"Methods and apparatus consistent with the invention allow a user to submit an ambiguous search query and to receive relevant search results. In one embodiment, a sequence of numbers received from a user of a standard telephone keypad is translated into a set of potentially corresponding alphanumeric sequences. These potentially corresponding alphanumeric sequences are provided as an input to a conventional search engine, using a boolean "OR" expression, and the search results are presented to the user. The search engine effectively limits search results to those in which the user was likely interested."
Published 9/5/02
"Methods and apparatus for employing usage statistics in document retrieval" [appft1.uspto.gov]
"Methods and apparatus consistent with the invention provide improved organization of documents responsive to a search query. In one embodiment, a search query is received and a list of responsive documents is identified. The responsive documents are organized based in whole or in part on usage statistics."
Published 4/11/02
"Methods and apparatus for using a modified index to provide search results in response to an ambiguous search query" [appft1.uspto.gov]
"A system allows a user to submit an ambiguous search query and to receive potentially disambiguated search results. In one implementation, a search engine's conventional alphanumeric index is translated into a second index that is ambiguated in the same manner as which the user's input is ambiguated. The user's ambiguous search query is compared to this ambiguated index, and the corresponding documents are provided to the user as search results."
[webmasterworld.com...]
This sounds like the old DirectHit usage popularity mechanism. Anyone know of any evidence of Google using this?
I know just because they get a patent, doesn't mean they'll use it any time soon (if ever). But this is quite interesting.
Also, consider the toolbar -> that is a form of 'usage' and those pages that are visited more often / by more unique toolbar users could be pushed higher...
Just some thoughts :) The usage data doesn't necessarily have to be adwords, but it could be too.
Speed is very important to Google, and to surfers.
They do use statistical sampling as a way of measuring 'quality'...so, they already *are* using the CTR on the SERP's after a fashion to rerank results.
GoogleGuy said it, so it must be true, right? :)
But -> you are correct. This could be *just* a patent related to adwords, and I'm reading too much into it...
<edited to clarify>
[0035] In one implementation, documents are organized based on a total score that represents the product of a usage score and a standard query-term-based score ("IR score"). In particular, the total score equals the square root of the IR score multiplied by the usage score. The usage score, in turn, equals a frequency of visit score multiplied by a unique user score multiplied by a path length score.[0036] The frequency of visit score equals log2(1+log(VF)/log(MAXVF). VF is the number of times that the document was visited (or accessed) in one month, and MAXVF is set to 2000. A small value is used when VF is unknown. If the unique user is less than 10, it equals 0.5*UU/10; otherwise, it equals 0.5*(1+UU/MAXUU). UU is the number of unique hosts/IPs that access the document in one month, and MAXUU is set to 400. A small value is used when UU is unknown. The path length score equals log(K-PL)/log(K). PL is the number of `/` characters in the document's path, and K is set to 20.
from:Methods and apparatus for employing usage statistics in document retrieval
I could be confused, but since when is the number of '/'charcters relevant to any "score" of a document?
I could be confused, but since when is the number of '/'charcters relevant to any "score" of a document?
(Nice post, btw)
As you get deeper into a directory tree, when it inherits pagerank from 'above', the PR decreases.. I.E.
www.example.com/index.html = PR5
www.exmaple.com/some/directory/here/index.html would probibly be ~PR3 (but almost certanly lower then 5)
This means that Google can have a single 'site usage score' for each page in their index, but DirectHit requires a (more complicated and harder to maintain) term-page matrix. Google can build score from various sources (like Google Bar usage, cached page stats, blogger logs & buying web logs from large ISPs like those tracking companies) while Direct Hit really requires search result link tracking.
Jamie