homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Spiders I Hate
I really HATE to see this in my logs!

 1:46 pm on Jan 9, 2001 (gmt 0)


I really would love it if someone could explain/show me how to poison these things with bogus addresses, or at least protect my own. I just despise SPAM!


han solo

 4:54 pm on Jan 9, 2001 (gmt 0)

You don't usually want to poison them with bogus addresses, from what I've heard, this usually backfires, or at the least, doesn't work.

What you can do is look at Littleman's profile, copy the cloaking script he posted, and then if the user agent matches an email siphon, or email stealing agent, give it a blank page, or send it to yahoo or something...I'm sure that would be interesting.

If you have any questions with Perl, which is the language that script is written in, there are plenty of moderators here who would trip over themselves answering your questions. And a forum for questions just like that!

Hope this helps,

Han Solo


 7:54 pm on Jan 9, 2001 (gmt 0)

Some people give email snatchers a page with the email addreses of antispam organizations and agencies.
Pretty evil thing to do ;)


 9:36 pm on Jan 9, 2001 (gmt 0)

That's a great idea! I have 9 visitors listed by WebLog as "E-mail Harvester" for user/agent...

I'm going to check my raw logs, and see if they all have something in common I can use to send them to some anti-spam groups...

Heh, heh, heh...


 10:03 pm on Jan 9, 2001 (gmt 0)

"Sugarplum is an automated spam-poisoner. Its purpose is to feed realistic and enticing, but totally useless or hazardous data to wandering address harvesters such as EmailSiphon, Cherry Picker, etc"


Don't that sound great, automated spam-poisoner!


 10:37 pm on Jan 9, 2001 (gmt 0)

OK... here's the info on the visitors WebLog flagged as Email Harvesters:

-Crescent Internet ToolPak HTTP OLE Control v.1.0

- Crescent Internet ToolPak HTTP OLE Control v.1.0

- Crescent Internet ToolPak HTTP OLE Control v.1.0

- webbandit/4.35.0

- Mozilla/3.Mozilla/2.01 (Win95; I)

- Mozilla/4.0 (compatible; BullsEye; Windows 95)

- Mozilla/3.Mozilla/2.01 (Win95; I)
- EmailSiphon (did a full crawl of site)
- Microsoft URL Control - 6.00.8169
- Mozilla/3.Mozilla/2.01 (Win95; I)

The EmailSiphon visit is pretty self expanatory, but wha tthe heck is "Crescent Internet ToolPak HTTP OLE Control v.1.0"???

Any info on any of them?


 11:13 pm on Jan 9, 2001 (gmt 0)

On my site, since I only need one contact email, I use a "contact us" perl script that use a form. The visitor just types in the subject, their addy, the body text and hits submit. The address is totally hidden in the source.

This really cut down on my delete keystrokes.

Definitely not a solution for all sites (especially with many addys, or if you want to spawn the email client), but may work for you.


 12:27 am on Jan 10, 2001 (gmt 0)

If you are running a cloaking script you can enter UA's you want to ban or redirect, or even IP's if you have unwanted spidering by an individual or company.

In the past I have banned or served up different pages to people running email siphoning software, WebZip, Teleport Pro, etc. My script can simply ban the user agent or ban it AND add the user IP to a blocked IP blacklist. It sends them a page telling them they have been blocked and if they want access to the site to contact the site administrator etc.

In my experience most people running this softare have little knowledge of TCP/IP and don't know how to fake the User Agent. Teleport Pro and others do allow this in the config but they probably don't realize this is how we are finding them or else you would presume they would turn it off ;)


 12:37 am on Jan 10, 2001 (gmt 0)

I find it interesting that one of these is running from an .edu (mic-gws.hood.edu - webbandit/4.35.0). I wonder how easy it would be to shut that one down with a "frank" discussion with the President of the Hood College about the evils of e-mail harvesting and what the bad publicity would do to the college if it ever became "a news topic". How active is this bad boy?


 12:42 am on Jan 10, 2001 (gmt 0)

The thing is, I don't know if they're all actually email harvesters or not... that's the way WebLog identified them, but I don't know how accurate WebLog is about figuring these things out...

Most of them only hit my root directory (Including the EmailSiphon one), so they're not hunting too hard...

That's why I was hoping someone had heard of any of them. I really don't know if WebLog knows what it's talking about...


 8:19 pm on Jan 10, 2001 (gmt 0)

Just an idea. do they follow the robot exclusion standard?

If so, it's easy:

user-agent: EmailSyphon
disallow: /

in robots.txt

problem fixed.



 9:26 pm on Jan 10, 2001 (gmt 0)

None of the visitors on that list logged any hits to robots.txt... I've noticed many spiders are unlikely to bother with manners unless they're a big player OR an .edu research spider.


 8:56 pm on Jan 14, 2001 (gmt 0)

Mivox wrote:
" wha tthe heck is "Crescent Internet ToolPak HTTP OLE Control v.1.0"???

That is an activeX control which programmers use to write their own browsers or crawlers. Another ID like that is "Microsoft URL Control". Since hundreds or thousands of programmers might use that, each for a different program, some programs might be harvesters, some might be site grabbers for offline perusal, and some might even be do-it-yourself browsers.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved