Actually I can't take credit, it was brought up by other folks in the recent discussions.
I'm looking at the claims section of patent application 0060294155 (detecting spam in phrase based)
Here it is in my words (which may be wrong but I'm trying to get it right)
It is worked out what phrases tend to be seen together in a naturally written document. (road building plus laying asphalt) At least I am guessing this is how they decide on the number of related phrases that are expected to be present. This is compared with the balance of these phrases in a typical spam document.
So when a page is spidered the accepted phrase ratio is already set. So if the page has one phrase more often than is acceptable it is filtered out or penalized. This is why I keep saying it's a thin line between ranking well and plunging in the serps.
Good point. I've been concentrating on the other aspect because I'm working with articles. With them there is usually a good variety of related phrases so I've been assuming it would help to get rid of any excessive phrases. So I've used a bit tunnel vision there and need to think about the other aspects.
That's why I've been noticing the ads that Google shows on their result pages when I search various phrases. Also I've noticed what words the scraper sites use. This gives me an idea of what words or phrases are causing the problem on a page that has dropped severly in the serps. I've done this in relation to my general topic but suspect it would be useful with other topics as well.