|Spam detection techniques|
What techniques does Google use to detect spam?
| 7:28 pm on Nov 23, 2004 (gmt 0)|
Does anyone have a clue on what algo's/filters/rules Google uses to detect spampages. On what distict marks of a webpage can Google identify spam?
| 8:19 pm on Nov 28, 2004 (gmt 0)|
funny question. as there is no "this is spam" definition, there can't be a clear answer to your question.
there are hundreds of different ways of spamming.
if you just need one simple example, it's hidden text.
| 2:01 pm on Nov 29, 2004 (gmt 0)|
I know spam when I see it, indeed...and there are a lot of techniques (one of the most obvious to be invisible text).
I was wondering if there are any automated techniques which google (or any other SE) uses.
For instance: to locate doorwaypages (or parked domains) a check on pages/domain can be done. If equals one there is a big chance it is a doorway or parked domain
But as far as I can see now, there is no automated spamdetection for webpages (like with email)...
or is there?
| 2:12 pm on Nov 29, 2004 (gmt 0)|
Actually, there is, and the basic methods are quite similar (checking for textual analysis, hosting, domain, content, etc).
The latest antispam tools are called bayesian semantic filters - basically, it's an attempt to program "natural" language traits into computers, allowing them to detect machine generated text, dup content, etc.
Believe me, these spam filters are a little more sophisticated than most people believe :)
| 5:40 pm on Dec 8, 2004 (gmt 0)|
This will help:
| 8:38 am on Dec 9, 2004 (gmt 0)|
link doesn't seem to work?