Forum Moderators: mack
It is a good point about monitoring people who remove sites re: corruption. Most online newsrooms have a tagteam setup...one writes, the other edits before you can post live. Then you send to a final check before you go live. Even with this though, you still see typos on CNN etc..
To REMOVE data you'd probably want something simplier than that...maybe just two levels i.e. Level 1 surfs for problems and then sends them onto their manager. Manager (Level 2) confirms the problem, removes THAT site and then classifies type of problem (with comments) before sending it off to the anti-spam programmers to look at the underlying code for footprints etc..
And once a site is taken out by Level 2, it'll be gone in all related SERPs as well, of course.
gmiller "Humans are horribly expensive, and they burn out quickly in repetitive jobs like that." - you'd think so, but in reality they don't. At least the offshore link submitters working for me aren't.
While to western standards this may not sound like a great job (although many do it here), there are many people in developing countries that would be very happy for steady work of this nature for $500/month or even $300/month. Assume 50% cost overhead, plus computers, and an SE could have 100 people doing this all year for $1 million dollars.
That would make a massive difference to SERPs but it'd be a drop in the bucket compared to how much $$ spam sites drag in. (And how much extra SEs could make if they had less spam.)
I don't think page rank was designed to catch negatives, but rather was designed to create positives. In other words, the idea behind it was to determine if a page had quality, not if a page was not spam. This is why I say I don't think spam per se was a core consideration in the intitial design of the algo.
I'd shy away from terms like 'traditional spamming techniques', spammers have no traditions, they are pure empiricists, they push and manipulate in order to find out what the current weakenesses are. This method will be a constant. Which means an algo designed with a fixed method will always fail, as opposed to an algo designed from the ground up to be completely fluid in terms of what it treats as spam.
The amount of time it takes spammers to crack the current version of this rigid algo will always be less than the amount of time it takes google to apply a patch. This is what suggests to me that google et al are pursuing a foolish, possibly even stupid, course.
Keep in mind that they are currently so incapable of dealing with spam that they have actuall have had to cut off all new sites from the main result set. It's easy to forget these facts, and to think that they are not related, but it's all related, weaknesses appear and become visible because the weaknesses are built in, it's sort of like IIS or Windows or IE security, the flaws are in the initial design, that's why they have to keep applying full redoes to their systems. That's why you no longer have the windows 9x series, and why IIS had to be fully rewritten.
So how many of these seo spammers are there? My guess is that there are not very many.
2by4 I think you may be underestimating this. There are millions of people Worldwide producing spammy aff sites to try to make a buck or two from Adsense. I would consider this to be spam. However I believe that you are correct in that the problem would not be hard to deal with because it's only a certain few who know how to get their sites to the top of the results. Weed them out and the problem is essentially gone.
I am just glad that we have at last focussed on the necessity for hand editing because I have been championing this for ages. (You will see this being mentioned in many more threads in future because, when you think about it, it really is a bit of a no brainer.)
No algo can anticipate a clever human's next move. Not now, not next year, not ever!
When the penny drops we can get this thing back under control and move on.
Which means an algo designed with a fixed method will always fail, as opposed to an algo designed from the ground up to be completely fluid in terms of what it treats as spam.
Even now every major search engine including Google uses human checking to determine the effectiveness of their algorithms... yet the algorithms are still gamed by "clever humans."
Wouldn't an algo which fails to produce consistent results and is therefore untestable by spammers be a better algo? Any system that strives for consistent results is "breakable," human or automated, so shouldn't the system's results be made inconsistent to hinder reverse engineering?
Just an idea. ;)
Some slight random element might help foil gaming the system, but there are limits.
If the random factor is too strong, it will just screw up otherwise fairly good results.
There could be something in place right now. If so, my uneducated guess is that it would
only knock a given page up or down by one or two slots.
Natural fluctuations could mask the randomness, or make it completely unnecessary.
- Larry
Keep in mind, you aren't talking here about detecting millions of spam affiliate sites, you're talking about detecting enough to create a pattern that the system can then use to automatically detect and delete that particular pattern. The pattern isn't written into the algo, the algo is written to work with the patterns it's given. Thus the quote above: a simple stupid system works better than a highly complex system. You don't need to do that much work I think to do that, all the affiliates will follow the same basic idea and construction, and so would be easy to then automatically add to the index of spamming methods.
And I think I'd go further: the presence of the affiliate site industry is made possible by the fact that search engines are not using such a method. That industry would not exist if this weakness were not present. Same for scraper sites etc. These spring up because they know the system will statistically make it worth their while to generate them. Their existence is testimony to the failure of the current methods.
Humans would just be the brains of the detection, the system once told what to look for by these brains would do the grunt work, it's like any other software application, your MS word doesn't write out your document for you, it just processes what you tell it to process.
By the way, I tested thunderbird's bayes spam filter last night on a friend's email account, didn't take long to get pretty much 99% or greater spam detection, that was in one session, closer to 99.5% I think it was. And that's a very very simple algo.
MS word's grammar checker would be an example of an automated system trying to work with meaning etc. It does a horrible job, not even remotely close to what a competent human would do.
The editors only make the game more expensive and difficult, but not impossible. Considering the top spam site makers are already investing heavily into making sites which are designed to pass human scrutiny (even feigning user interaction), I consider a human check at this point would only be a stop gap that would weed out the bottom xx% of site spammers, making the remaining xx% that much stronger.
I think Google is willing to build a fully automated system that takes into account human interaction, so long as those humans don't realize they are part of the editing. (To eliminate corruptability and the adherence to published standards, the editors can't know they are editors and can only decide on their internal standards.) It's what link popularity originally worked well for until Google made link popularity too popular.
To paraphrase BDW:
No clever human can reliably anticipate another clever human's next move. Maybe this time, maybe next time, but never every time!
Natural fluctuations could mask the randomness, or make it completely unnecessary.
Agreed. Although I still feel confident that making the testing difficult would be the quickest to implement and most cost effective hindrance to the spammers today while algo's are being trained to deal with tomorrow's challenges.
Yeah, that thought crossed my mind, but that sounds an aweful lot like spammers would have to make real websites, which is fine. But also imagine this, your job is to look at websites all day. You see so many that you can spot techniques almost instantly, after a while you could probably even name the spammer who made it by his or her style.
Optimizing for humans sort of sounds like a victory to me. I remember the first time I placed number one out of something like 8 million results, I couldn't believe it, I had almost zero content on the page, it was a total accident, I went with it, but it always amazed me that Google could do something that silly. But it's because my page by accident fit the current weakness of the algo. Wasn't spam, just an accident.
Maybe there is no answer, I don't know, I do know that I don't trust programmers to make certain decisions, they just aren't very good at some things. And the better they are at programming, the worse they tend to be in other areas. But maybe it really doesn't matter, it's hard to say.
That convergence is just still some way away. ;)