Forum Moderators: open

Message Too Old, No Replies

Spam bot for google?

Do you think google would mind if I built an automated spam bot?

         

eplus

7:45 am on Aug 7, 2002 (gmt 0)

10+ Year Member



I was wondering about all that nasty obvious spam that annoys us all and was thinking how so much of it could easily be detected. I know that google doesn't waste it's time searching for spam but I was thinking I could happily build a bot to crawl around in some listings and hunt out the top offenders then submit them to googles’ spam report. I'd probably have to limit it to no more than ten a day or I'd have google banning my IP or something nasty.
Even if it didn't submit to google it might make for a nice little tool for people to look at spammers in the top ten results for their keywords. I kind of automated little spam checker, might even be nice to see if your own site gets close to the mark by accident. Does anyone think this might be useful or am I just getting annoyed over nothing?
If anyone is interested I'd probably be using php curl and googles’ nice little web api to fetch listings. I'd probably only test for things like hidden text, loaded keyword meta tags(not that google uses them?) and some of the more basics offences but by the look of it that would catch 90+% of offenders.
Kind Regards
eplus ;-)
(strange what you think of when you get up in the morning.)

Josk

8:41 am on Aug 7, 2002 (gmt 0)

10+ Year Member



And how would you be defining spam...? Yes it's a nice idea, but I'd rather have Google deciding what is spam and what isn't. One persons spam is anothers seo...

eplus

8:50 am on Aug 7, 2002 (gmt 0)

10+ Year Member



I think there are some things that are borderline but there are other things that are very obviously spam, this is what I'd be looking for. I don't think anyone can say that hidden text loaded with keywords is anything other than spam for example. On the other things I'd guess at something like cast.orgs' bobby where it suggests that you might want to check and make up your own mind. Just a thought, it was one of those annoying ideas that you play with in your head at bed time instead of getting that much needed sleep.

lazerzubb

8:58 am on Aug 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



google doesn't waste it's time searching for spam

Belive me they do!

eplus

9:05 am on Aug 7, 2002 (gmt 0)

10+ Year Member



Unfortunatly googles database is VERY large and anything more than a cursive look would surely take far too much server overheads. I can't believe their filters are that great as I've seen hundreds of linkfarms and general breaches of the T&C all happily listed in google. I'm not saying it's googles fault or that they aren't trying I'm just saying to implement a strong spam filter across their entire database is surely a logistical nightmare.

afterburner

11:07 am on Aug 7, 2002 (gmt 0)

10+ Year Member



Google does a fine job of filtering spam by itself. I have had a few domains kicked out for cloaking in the past.

Josk

11:07 am on Aug 7, 2002 (gmt 0)

10+ Year Member



So...google doesn't have the resources, but you do? Is that a server in your pocket, or are you just pleased to see me? Assuming you are just happy to see me, how will you decide to prioritise your spam busting?

Hemsell

11:27 am on Aug 7, 2002 (gmt 0)

10+ Year Member



I think they would love it if you did it in the proper way.
That is through the programming contest. They have an api and other stuff available to access there DB without causing to much strain on the system.

I know they are interested in spam, and they would probably welcome any innovative ways of detecting it.

Todd

eplus

3:01 pm on Aug 7, 2002 (gmt 0)

10+ Year Member



sorry josk, not suggesting that I have a farm of servers out their doing nothing that I could put to work finding all the spam out there. I was just thinking about playing with the idea of a bot that could be targeted at a particular keyword, get a list from google and then check each site for some basic spam. Something that would make me feel better when I come up against a spammy serp.

As I said I'd probably try and make it a bit like bobby in that it fetches the page code and then marks up a few points where it's either definatly breaking the rules or could do with a user having a look at it. More for fun than anything serious, just something to make me feel a bit less powerless when competing against spam. Kind of writing a program to tell me I'm right, a bit silly really. It's probably always going to agree with me that something is spam as all it would ever do is what I tell it.

zan_d

4:59 pm on Aug 7, 2002 (gmt 0)

10+ Year Member



hey sounds like a great idea, i'd love to help as a php-junkie myself.

eplus

8:27 pm on Aug 8, 2002 (gmt 0)

10+ Year Member



maybe we can go a bit open source. I've got a stag do this weekend and my wedding on the 21st but I'll probably start playing with it around then.

korkus2000

8:36 pm on Aug 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would only run it at off peak hours. It could be considered automated querying and they would ban your IP.

bcc1234

8:57 pm on Aug 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



eplus, you should've entered into the programming contest held by google a while back. They provided an API and a set of indexed data.
If you could create a bot that would effectively detect spam - the next stop would be the google labs :)

bcc1234

9:00 pm on Aug 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hey sounds like a great idea, i'd love to help as a php-junkie myself.

Hmm, that would have to be highly-optimized c code, an not php in any way.

TWhalen

9:21 pm on Aug 8, 2002 (gmt 0)

10+ Year Member



Sounds like a great idea...

Say, just out of curiosity, was it your life-long dream to grow up and become the world's biggest tattle-tale?

Why on earth would you want to waste your time building a spam(bot)filter and feed your results to Google? Why not build your own search engine, apply this "wonderful" filter on your OWN engine, and reap the benefits of owning your own "spam free" search index?

Just a thought.

eplus

10:13 pm on Aug 8, 2002 (gmt 0)

10+ Year Member



Sorry not the worlds biggest tattle tale just don't like spam. Not sure what's wrong with not being keen on spam but maybe that's just me. with regards writing my own search engine, why? I think googles all the search engine I need I just thought it would be interesting to write a spam bot just for the sake of it. I reckon it would be kind of interesting to poke a keyword into it and see who in the top 50 results is using naughty tactics.

Anything I'd write would be far too server intensive to be applied to large scale filtering but I think it would still be an interesting project to play with. Although I have some training in C I'd probably code it in php as it would be a damn site easier. I wouldn't expect it to run more than a couple of times an hour so I don't think it would qualify for a ban, especially if I used the api, I'd hope not at least.

I think the key thing to remember is it would just be a bit of fun.

GoogleGuy

5:56 pm on Aug 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



eplus, if you want to write something like that, we'd be happy to receive the results. If you have to hit Google on a regular basis, you'd want to get an Google API key and do it that way.

But in general we're always happy to get more data on potential spam.

eplus

8:24 pm on Aug 11, 2002 (gmt 0)

10+ Year Member



thankyou googleguy. I think I'll start playing with it soon, I'll have to brush up on my soap first though.

matthias

9:26 pm on Aug 11, 2002 (gmt 0)

10+ Year Member



That sounds like a cool project. Let me know, if you make it open source.

If nobody has the cpu/memory resources to do more spam detection, maybe someone should think about a distributed system (the way SETI@home etc. does). Make it an option in the toolbar or something like that. THAT would be cool (in several ways). Well, one more project on my list, but one I will certainly not start alone.