| 3:20 pm on Mar 16, 2007 (gmt 0)|
I've seen, printed out and read the second paper, but not the first, and it's a real eye-opener. I agree that this is HOT because it's kind of confirming some suspicions I've been toying with.
I've wondered about hyphens for quite a while, and have even re-done some sites to eliminate some hyphenated filenames and subdirectories (though I use them for ease of maintenance), and have been going through search after search at Yahoo recently, looking for hyphens and underscores. I have noticed a substantial absence of pages with hyphens in the top 20-30. It may not be just one of the factors mentioned, maybe it's a combination of factors that can push a site over the edge.
What I do wonder, though, is whether in reality they detect sites with what they consider to be negatives algorighmically, or use human review.
Sometimes they really are out to get paranoid people.
| 11:53 pm on Mar 17, 2007 (gmt 0)|
Are you referring to section 5 in regards to the hyphens?
Qualitative aspects of spam hosts
Finally, we wanted to evaluate the prevalence of different spamming aspects. For this end, and as
a preliminary study, we ran a second round of evaluations by sampling at random 200 hosts that
were tagged by at least two judges as Web spam. We wanted to examine the most relevant features
found in hosts that were tagged as spam. After inspection of these hosts, we decided to tabulate
them using the following (non-exclusive) criteria:
Keywords in URL: The host contains keywords in the URLs, separated by minus, underscore
or the plus sign. This is not necessarily a spamming aspect.
I think they have just discovered a "qualitative aspect" of many hosts then, spam host or not. I do note that they discount this as "not necessarily a spamming aspect" though.