Welcome to WebmasterWorld Guest from

Forum Moderators: martinibuster

Message Too Old, No Replies

HOT - Yahoo research papers on spam, VLSI and other topics!

How will yahoo judge if something is spam?

1:22 am on Mar 16, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 11, 2004
votes: 0

This is pretty interesting:


Some more cool stuff here:


And here:


3:20 pm on Mar 16, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 29, 2000
votes: 0

I've seen, printed out and read the second paper, but not the first, and it's a real eye-opener. I agree that this is HOT because it's kind of confirming some suspicions I've been toying with.

I've wondered about hyphens for quite a while, and have even re-done some sites to eliminate some hyphenated filenames and subdirectories (though I use them for ease of maintenance), and have been going through search after search at Yahoo recently, looking for hyphens and underscores. I have noticed a substantial absence of pages with hyphens in the top 20-30. It may not be just one of the factors mentioned, maybe it's a combination of factors that can push a site over the edge.

What I do wonder, though, is whether in reality they detect sites with what they consider to be negatives algorighmically, or use human review.

Sometimes they really are out to get paranoid people.

11:53 pm on Mar 17, 2007 (gmt 0)


WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 31, 2003
votes: 3

Are you referring to section 5 in regards to the hyphens?

Qualitative aspects of spam hosts
Finally, we wanted to evaluate the prevalence of different spamming aspects. For this end, and as
a preliminary study, we ran a second round of evaluations by sampling at random 200 hosts that
were tagged by at least two judges as Web spam. We wanted to examine the most relevant features
found in hosts that were tagged as spam. After inspection of these hosts, we decided to tabulate
them using the following (non-exclusive) criteria:

Keywords in URL: The host contains keywords in the URLs, separated by minus, underscore
or the plus sign. This is not necessarily a spamming aspect.


I think they have just discovered a "qualitative aspect" of many hosts then, spam host or not. I do note that they discount this as "not necessarily a spamming aspect" though.


Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members