Page is a not externally linkable
thegypsy - 5:28 am on Feb 9, 2007 (gmt 0)
This is interesting. That thread seems to be moving in a different direction so I started this for ONE simple area - Phrase Based Indexing and Retrieval (I call it PaIR to make life easier) There is MORE than the thoughts that Ted started towards, as far as -30 type penalties. I have drudged through 5 of the PaIR related patents from the last year or so and written 3 articles and ONE conspiracy theory on the topic. Of the more recent inferences was a conspiracy theory with the recent GoogleBomb defused affair. In specific, from the patent Phrase identification in an information retrieval system [appft1.uspto.gov]: [0153] Each phrase in the index 150 is also given a phrase number, based on its frequency of occurrence in the corpus. The more common the phrase, the lower phrase number it receivesorder in the index. The indexing system 110 then sorts 506 all of the posting lists in the index 150 in declining order according to the number of documents listedphrase number of in each posting list, so that the most frequently occurring phrases are listed first. The phrase number can then be used to look up a particular phrase. " Call me a whacked out conspiracy theorist, but I think we could have something here. Is it outright evidence that Google has migrated to a PaIR based model? Of course not. I would surmise that it is simply another layer that has been over the existing system and the last major infrastructure update (dreaded BigDaddy) facilitated it. But that's just me I am curious as to complimentary/contrary theories as mentioned by Ted in the other "Phrase Based Optimization" thread. I simply wanted to keep a clean PaIR discussion. For those looking to get a background in PaIR methods, links to all 5 patents: Phrase-based searching in an information retrieval system [appft1.uspto.gov] Multiple index based information retrieval system [appft1.uspto.gov] Phrase-based generation of document descriptions [appft1.uspto.gov] Phrase identification in an information retrieval system [appft1.uspto.gov] Detecting spam documents in a phrase based information retrieval system [appft1.uspto.gov] I would post snippets, but it is a TON of research.. (many groggy hours).. I felt posting WHAT "Phrase Based Indexing and Retrieval" is, would also dilute the intended direction of the thread; which is to potentially stitch together this and the suspicians of PaIR being at work in the -whatever penaties... more evidence that is it being implemented. Note: There is a sixth Phrase-based patent: [edited by: tedster at 6:59 am (utc) on May 14, 2007]
I just noticed the thread on relationships of 'Phrase Based' layering in the -Whatever penalties. "[0152] This approach has the benefit of entirely preventing certain types of manipulations of web pages (a class of documents) in order to skew the results of a search. Search engines that use a ranking algorithm that relies on the number of links that point to a given document in order to rank that document can be "bombed" by artificially creating a large number of pages with a given anchor text which then point to a desired page. As a result, when a search query using the anchor text is entered, the desired page is typically returned, even if in fact this page has little or nothing to do with the anchor text. Importing the related bit vector from a target document URL1 into the phrase A related phrase bit vector for document URL0 eliminates the reliance of the search system on just the relationship of phrase A in URL0 pointing to URL1 as an indicator of significance or URL1 to the anchor text phrase.
Phrase identification in an information retrieval system [appft1.uspto.gov]