---- "Phrase Based Indexing and Retrieval" - part of the Google picture?
tedster - 8:08 pm on Feb 17, 2007 (gmt 0)
So if the page has one phrase more often than is acceptable it is filtered out or penalized.
However, the frequency for what is "acceptable" is still something that "significantly exceeds the expected number".
 A spam document may be indicated if the actual number N of related phrases significantly exceeds the expected number E, for some minimum number of good phrases. In one implementation, N significantly exceeds E where it is at least some multiple number of standard deviations greater than E, for example, more than five standard deviations. In another implementation, N significantly exceeds E where it is greater by some constant multiple, for example N>2E. Other comparison measures can also be used as a basis for determining that the actual number N of related phrases significantly exceeds the expected number E. In another embodiment, N is simply compared with a predetermined threshold value, such as 100 (which is deemed to be maximum expected number of related phrases).
 Using any of the foregoing tests, it is determined whether this condition is met for some minimum number of good phrases. The minimum may be a single phrase, or perhaps three good phrases. If there are a minimum number of good phrases which have an excessive number of related phrases present in the document, then the document is deemed to a spam document.