Forum Moderators: open
I know they are interested in spam, and they would probably welcome any innovative ways of detecting it.
Todd
As I said I'd probably try and make it a bit like bobby in that it fetches the page code and then marks up a few points where it's either definatly breaking the rules or could do with a user having a look at it. More for fun than anything serious, just something to make me feel a bit less powerless when competing against spam. Kind of writing a program to tell me I'm right, a bit silly really. It's probably always going to agree with me that something is spam as all it would ever do is what I tell it.
Say, just out of curiosity, was it your life-long dream to grow up and become the world's biggest tattle-tale?
Why on earth would you want to waste your time building a spam(bot)filter and feed your results to Google? Why not build your own search engine, apply this "wonderful" filter on your OWN engine, and reap the benefits of owning your own "spam free" search index?
Just a thought.
Anything I'd write would be far too server intensive to be applied to large scale filtering but I think it would still be an interesting project to play with. Although I have some training in C I'd probably code it in php as it would be a damn site easier. I wouldn't expect it to run more than a couple of times an hour so I don't think it would qualify for a ban, especially if I used the api, I'd hope not at least.
I think the key thing to remember is it would just be a bit of fun.
If nobody has the cpu/memory resources to do more spam detection, maybe someone should think about a distributed system (the way SETI@home etc. does). Make it an option in the toolbar or something like that. THAT would be cool (in several ways). Well, one more project on my list, but one I will certainly not start alone.