Welcome to WebmasterWorld Guest from 23.22.46.195

Forum Moderators: phranque

Spam detection techniques

What techniques does Google use to detect spam?

   
7:28 pm on Nov 23, 2004 (gmt 0)

10+ Year Member



Hi,

Does anyone have a clue on what algo's/filters/rules Google uses to detect spampages. On what distict marks of a webpage can Google identify spam?

Thx

8:19 pm on Nov 28, 2004 (gmt 0)

10+ Year Member



funny question. as there is no "this is spam" definition, there can't be a clear answer to your question.
there are hundreds of different ways of spamming.
if you just need one simple example, it's hidden text.
2:01 pm on Nov 29, 2004 (gmt 0)

10+ Year Member



I know spam when I see it, indeed...and there are a lot of techniques (one of the most obvious to be invisible text).

I was wondering if there are any automated techniques which google (or any other SE) uses.

For instance: to locate doorwaypages (or parked domains) a check on pages/domain can be done. If equals one there is a big chance it is a doorway or parked domain

But as far as I can see now, there is no automated spamdetection for webpages (like with email)...

or is there?

2:12 pm on Nov 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually, there is, and the basic methods are quite similar (checking for textual analysis, hosting, domain, content, etc).

The latest antispam tools are called bayesian semantic filters - basically, it's an attempt to program "natural" language traits into computers, allowing them to detect machine generated text, dup content, etc.

Believe me, these spam filters are a little more sophisticated than most people believe :)

5:40 pm on Dec 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This will help:

[newdbpubs.stanford.edu:8090...]

8:38 am on Dec 9, 2004 (gmt 0)

10+ Year Member



link doesn't seem to work?
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month