Page is a not externally linkable
mrMister - 9:18 am on Mar 26, 2005 (gmt 0)
It has to occur at least once in every one of the 8 billion pages it does not. Likewise search on font. Google therefor must strip the html. The similarities logic and the ranking logic don't have to use the same source data (your test was on the ranking logic so is therefore invalid) I'm confident that Google does take in to account the HTML when determining similarity. On pages I've developed with little text and large templates the duplicate ontent filter kicks in. This is even though the main text will be very different
Html tags are striped.
You can test this by searching in Google for the word html.