Welcome to WebmasterWorld Guest from 184.108.40.206
[edited by: tedster at 2:08 pm (utc) on Feb 1, 2013]
I have had some success fixing pages where the title and H1 tag were using the same keywords and also repeated in first paragraph
It's just that I don't see Google inflicting a true penalty on a site just because there's an exact match between H1 and title.
 In the first section (first m entries), the following relevance attributes are stored for each document entry in the posting list of a given phrase:  1. The document relevance score (e.g., page rank);  2. Total number of occurrences of the phrase in the document;  3. A rank ordered list of up to 10,000 anchor documents that also contain the phrase and which point to this document, and for each anchor document its relevance score (e.g., page rank), and the anchor text itself; and  4. The position of each phrase occurrence, and for each occurrence, a set of flags indicating whether the occurrence is a title, bold, a heading, in a URL, in the body, in a sidebar, in a footer, in an advertisement, capitalized, or in some other type of HTML markup.
 M(p): Number of interesting instances of the possible phrase. An instance of a possible phrase is "interesting" where the possible phrase is distinguished from neighboring content in the document by grammatical or format markers, for example by being in boldface, or underline, or as anchor text in a hyperlink, or in quotation marks. These (and other) distinguishing appearances are indicated by various HTML markup language tags and grammatical markers. These statistics are maintained for a phrase when it is placed on the good phrase list 208.
[edited by: MikeNoLastName at 11:30 am (utc) on Feb 27, 2013]
This specificity is all so absurd! G get a life! They are acting like a spoiled little baby. They only want from us what they are too lazy to create themselves, and punish by crying because they can't have it served exactly how they want it on a silver platter. If all the relatives would just stop paying attention to the wah-wah little G-baby in the corner of the room he might stop crying.
I don't know how many read the patent document.
I think it is a must.
4. The method of claim 1, wherein identifying the document as a spam document, further comprises: responsive to the actual number of related phrases present in the document for at least one phrase exceeding the expected number of related phrases by at least a multiple of a standard deviation of the expected number of related phrases, identifying the document as a spam document.
at least a multiple of a standard deviation
read the whole thing.