Forum Moderators: open

Message Too Old, No Replies

email harvesting from google and its cache

considerable amount of bots harvesting for email addresse

         

eraldemukian

4:37 am on Apr 30, 2003 (gmt 0)

10+ Year Member



Hello,

since a while I noticed many search terms like

email address guestbook of nigerian government officials

well, not that specific one, but replace country and profession with anything you like and you have a pretty good chance
to match a query that I have seen. A range of IPs, the traffic comes mostly from google, and the Agent tag is spoofed to one
of the main browsers.

On my my side I starting generating one thousand random and non existing email addresses whenever I see such a referer.
Spam the spammers. Not that would decrease the amount of spam I get, but at least I did something.

This stops working in many cases, since these queries get often harvested out of the google cache. Which makes sense out
of the harvesters perspective: google is in average much faster and more reliable serving them what they are after: email addresses.

Which got me thinking:

An email address is relatively easy to detect. Couldn't google just filter them out? Or maybe there could be an addition in the robots.txt
file that would indicate to filter email addresses on a given site?

email spam is bad for 99.97% of all people on the web. It takes up bandwidth, but more importantly it makes it impossible
to get easy communications via email based on websites. In my experience the feeback rate droped significantly after I changed my mailto:
tags to something that people would need to edit manually before they click the send button.

I hope this has not been discussed already

otherwise: sorry for the waste of bandwidth

Bradley

4:41 am on Apr 30, 2003 (gmt 0)

10+ Year Member



"In my experience the feeback rate droped significantly after I changed my mailto:
tags to something that people would need to edit manually before they click the send button."

Put your email address within javascript. The page displays the email address, but the source code calls the .js routine which contains the HTML code for the email address. This 100% solves your problem.