homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

EmailSiphon spider

 1:10 pm on May 2, 2001 (gmt 0)

Is anyone familiar with this spider?
It sounds like spider that grabs email addresses off web documents. Is this true?
Should I be banning this spider from my site. I resolved the IP address to find out it origiantes from www.wanadoo.fr, but that is all I know.



 1:15 pm on May 2, 2001 (gmt 0)

Yes it grabs email addresses, yes you should (in my opinion) ban it. Its a tool used by individuals so it could originate from any IP.


 1:17 pm on May 2, 2001 (gmt 0)

Thank you, I appreciate the help.


 1:22 pm on May 2, 2001 (gmt 0)

how would one ban it? I've seen solutions that only work on unix boxes but not NT - any solutions?


 1:30 pm on May 2, 2001 (gmt 0)

This is a good point.
Generally, I would place the the text;
User-agent: EmailSiphon
Disallow: /
in the robots.txt page.
But, I went back to my logs and found that this UA did not request the robots.txt!
Does anyone know of any other way to keep him from revisiting my site?
I have modified my robots.txt to disallow him if he ever request it, but i don't think he ever will.


 1:26 pm on May 8, 2001 (gmt 0)

I too have noticed this spider running rampant on a number of my client's sites... And have recently noticed a barrage of email traffic bogging down our internal POP3... Not exactly the best situation that you want to be in, you know?

Any luck for anyone banning this? Any solutions?
Any updates would be much appreciated.



 1:29 pm on May 8, 2001 (gmt 0)

I had placed that disallow statement in my robots.txt, but I haven't seen EmailSiphon back at my site since to know if he will obey the robots.txt or not.


 2:52 pm on May 8, 2001 (gmt 0)

It may or may not request robots.txt but one sure fire way to beat anybody or anything that tries to get at your email addy's is to remove it from the html. Use gif's with your email in them to display to visitors and encase your email addy in a formmail cgi or php script so no one...not even a snooping human...can get to it. Scour the servernside scripting forum for a hack.


 4:45 pm on May 8, 2001 (gmt 0)

It seems though that this is an unlikely alternative to protecting an email address from such abuse.

How in your mind, could this be done through scripts residing on the server?


 6:12 pm on May 8, 2001 (gmt 0)

I really HATE that thing! I tried to ban it using robots.txt and apparently it "ignored" it. I got a JS program to camoflage my email addersses here:


Jury is still out, though. Siphon has come and gone since I installed the script, but can't really gauge if the spam I get now is from new crawls, or just leftover from before.

Now if I can only figure out how to stop them from spamming me at my e-amil address listed on my resume at monster.com....

Edited by: scott


 8:26 pm on May 8, 2001 (gmt 0)

Does anyone know if emailsiphon can 'read' unicode? If not, you can replace all the "@" signs in your HTML code with &-#-6-4-; (remove the hyphens first ;)... I had to break up the string to prevent it from being 'deciphered' by the forum script).

Browsers will read the unicode string and display "@" in it's place, email links will work like normal, but perhaps emailsiphon won't recognize it as an email address?


 5:21 am on May 9, 2001 (gmt 0)

You can scan the USER_AGENT with a script, then compare it to a list of unwanted visitors, and redirect them to a page with no email addresses, but that means all your pages will have to be wrapped in a script. Better to force the redirects within the server if you have that kind of access. See Charles Brabec's site at [mosa.unity.ncsu.edu...] for more information and a list of harvesters' id strings. If you're interested in a PHP solution, say so, and I'll post what I've written so far.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved