Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

The "Spam Assassin" Theory - maybe Google has a similar system?

         

jetteroheller

9:25 am on Oct 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Spam Assasin is a common tool to handle spam emails.

There is a system of points assigned to emails

The spam border is usual 5

No single attribute can reach this.
Only the combination of more than one attribute can reach the 5.

Sure there are also many false positive, important emails blocked by the spam filter.

I think there is something similar at Google.

All we hear "My site lost 80% of Google traffic"
"My site escaped from the filter"

Are caused by

* New attributes
* Changing attributes
* Changing the values of attributes

Let's imagine domain A has
R10 2
R11 2
R12 2 spam points and is in the filter

Let's imagine domain B has
R10 2
R15 1
R16 1 spam points and is good to find

Now maybe is there at Google some sort of conference
They give the rules new values

R10=2
R11=1
R12=1
R15=2
R16=2

Next day, we heva here a post of webmaster A
"Great, my domain escaped from the filter"
and a reply from webmaster B
"I lost 80% of my Google traffic"

tedster

9:24 pm on Oct 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Email spam and web spam are very different critters. I think Google is at least aiming for a much more sophisticated system, judging by their patents. There certainly is weighting applied to various metrics, but what those metrics are and how they are obtained is where the sophistication lies.

jetteroheller

9:13 am on Oct 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Email spam and web spam are very different critters

But the effects are the same.

It's like an email has 4,9 spam points and comes straight in my inbox or has 5,1 spam points and is delivered in the spam folder.

There is nothing in between. It's like marked as spam or not marked as spam. It's like pregnant or not, there is nothing in between.

jetteroheller

10:06 am on Oct 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why to understand this is so important and why it blocked handling of problems

Lets assume there are 3 different domains.

A: 4,9
B: 4,5
C: 2,0 spam points

Now duddenly is a new spam rule introduced and applied to all 3 domains. It's only a minor point with 0,2. Not the rating is

A: 5,1 filtered
B: 4,7
C: 2,2

Now the webmaster tries to find all the differences between A, B and C. Why is A filtered and B and C not?

Maybe he discovers exactly the small problem, what brings the 0,2 spam points, but he does not apply the solution, because all the 3 domains have the same bad feature, but only one was moved into the filter.

This was always my problem at the June 27th 2006 disaster. 10 subdomains, some in the filter, some not, mobing out, moving in.

And I tried always to find a common feature for all filtered subdomains, but this was not possible to find.

cmendla

1:56 pm on Oct 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



JH

I've worked a fair bit with spamassassin and I think you may be on to something. SA is a fairly complex piece of work and can be set to 'learn' by who you mail to and how you handle suspected spam.

Also, a long time ago, I took some post master's statistics (and realized that stuff was for smart people).. Anyway, one of the things we studies was multiple variables and interaction between variables.

I think your hypotheis is totally valid except I think google might have an item or two in the list that will mark the site as spam regardless of other factors (That's not supported by anything on my part, just a feeling). Regardless, your theory would still hold true for most spam indicators.

If you are right, then it could mean that cause and effect get blurry.

i.e.
- Rule 15 changes something that pushes a site into low rankings.
- The site owner thinks it's related to something that falls under rule 12 and changes that which just happens to get them under the penalty point.
- The owner thinks that it was something google did new with 12 but in reality it was 15 that pushed them over the edge.

I'm not sure if I can post a link to spamassassin but if anyone wants, they can look up SA and see how the rules work and the cumulative nature of the indicators.

cg