For example, the phrase "President of the United States" is a phrase that predicts other phrases such as "George Bush" and "Bill Clinton." However, other phrases are not predictive, such as "fell down the stairs" or "top of the morning," "out of the blue," since idioms and colloquisms like these tend to appear with many other different and unrelated phrases. Thus, the phrase identification phase determines which phrases are good phrases and which are bad (i.e., lacking in predictive power).
The term "good phrase" appears very early on in the process in step , long before the spam detection parts. I read it as saying that a "good phrase" is one that can be used as a relevance indicator for the search phrase. Of course a spam page will target such good phrases, but so will any good result for the search as well.