homepage Welcome to WebmasterWorld Guest from 54.146.190.193
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Spiders I Hate
I really HATE to see this in my logs!
scott

10+ Year Member



 
Msg#: 269 posted 1:46 pm on Jan 9, 2001 (gmt 0)

40bc0ab8.dsl.flashcom.net.......64.188.10.184......EmailSiphon

I really would love it if someone could explain/show me how to poison these things with bogus addresses, or at least protect my own. I just despise SPAM!

 

han solo

10+ Year Member



 
Msg#: 269 posted 4:54 pm on Jan 9, 2001 (gmt 0)

You don't usually want to poison them with bogus addresses, from what I've heard, this usually backfires, or at the least, doesn't work.

What you can do is look at Littleman's profile, copy the cloaking script he posted, and then if the user agent matches an email siphon, or email stealing agent, give it a blank page, or send it to yahoo or something...I'm sure that would be interesting.

If you have any questions with Perl, which is the language that script is written in, there are plenty of moderators here who would trip over themselves answering your questions. And a forum for questions just like that!

Hope this helps,

Cheers,
Han Solo

PeteU

10+ Year Member



 
Msg#: 269 posted 7:54 pm on Jan 9, 2001 (gmt 0)

Some people give email snatchers a page with the email addreses of antispam organizations and agencies.
Pretty evil thing to do ;)

mivox

WebmasterWorld Senior Member mivox us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 269 posted 9:36 pm on Jan 9, 2001 (gmt 0)

That's a great idea! I have 9 visitors listed by WebLog as "E-mail Harvester" for user/agent...

I'm going to check my raw logs, and see if they all have something in common I can use to send them to some anti-spam groups...

Heh, heh, heh...

NFFC

WebmasterWorld Senior Member nffc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 269 posted 10:03 pm on Jan 9, 2001 (gmt 0)

"Sugarplum is an automated spam-poisoner. Its purpose is to feed realistic and enticing, but totally useless or hazardous data to wandering address harvesters such as EmailSiphon, Cherry Picker, etc"

[devin.com]

Don't that sound great, automated spam-poisoner!

mivox

WebmasterWorld Senior Member mivox us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 269 posted 10:37 pm on Jan 9, 2001 (gmt 0)

OK... here's the info on the visitors WebLog flagged as Email Harvesters:

ip-173-161.nyc-apt.primenet.com
-Crescent Internet ToolPak HTTP OLE Control v.1.0

cx970082-a.dnpt1.occa.home.com
- Crescent Internet ToolPak HTTP OLE Control v.1.0

mail.pcguru.com
- Crescent Internet ToolPak HTTP OLE Control v.1.0

mic-gws.hood.edu
- webbandit/4.35.0

aph-aug-101-1-1-246.abo.wanadoo.fr
- Mozilla/3.Mozilla/2.01 (Win95; I)

modem249.gtepacifica.net
- Mozilla/4.0 (compatible; BullsEye; Windows 95)

as1-6-159.peaknet.net
- Mozilla/3.Mozilla/2.01 (Win95; I)

212.234.180.5
- EmailSiphon

63.210.161.34 (did a full crawl of site)
- Microsoft URL Control - 6.00.8169

64.182.209.125
- Mozilla/3.Mozilla/2.01 (Win95; I)

The EmailSiphon visit is pretty self expanatory, but wha tthe heck is "Crescent Internet ToolPak HTTP OLE Control v.1.0"???

Any info on any of them?

Drastic

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 269 posted 11:13 pm on Jan 9, 2001 (gmt 0)

On my site, since I only need one contact email, I use a "contact us" perl script that use a form. The visitor just types in the subject, their addy, the body text and hits submit. The address is totally hidden in the source.

This really cut down on my delete keystrokes.

Definitely not a solution for all sites (especially with many addys, or if you want to spawn the email client), but may work for you.

Fusioneer

10+ Year Member



 
Msg#: 269 posted 12:27 am on Jan 10, 2001 (gmt 0)


If you are running a cloaking script you can enter UA's you want to ban or redirect, or even IP's if you have unwanted spidering by an individual or company.

In the past I have banned or served up different pages to people running email siphoning software, WebZip, Teleport Pro, etc. My script can simply ban the user agent or ban it AND add the user IP to a blocked IP blacklist. It sends them a page telling them they have been blocked and if they want access to the site to contact the site administrator etc.

In my experience most people running this softare have little knowledge of TCP/IP and don't know how to fake the User Agent. Teleport Pro and others do allow this in the config but they probably don't realize this is how we are finding them or else you would presume they would turn it off ;)

mnw

10+ Year Member



 
Msg#: 269 posted 12:37 am on Jan 10, 2001 (gmt 0)

I find it interesting that one of these is running from an .edu (mic-gws.hood.edu - webbandit/4.35.0). I wonder how easy it would be to shut that one down with a "frank" discussion with the President of the Hood College about the evils of e-mail harvesting and what the bad publicity would do to the college if it ever became "a news topic". How active is this bad boy?

mivox

WebmasterWorld Senior Member mivox us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 269 posted 12:42 am on Jan 10, 2001 (gmt 0)

The thing is, I don't know if they're all actually email harvesters or not... that's the way WebLog identified them, but I don't know how accurate WebLog is about figuring these things out...

Most of them only hit my root directory (Including the EmailSiphon one), so they're not hunting too hard...

That's why I was hoping someone had heard of any of them. I really don't know if WebLog knows what it's talking about...

skirril

10+ Year Member



 
Msg#: 269 posted 8:19 pm on Jan 10, 2001 (gmt 0)

Just an idea. do they follow the robot exclusion standard?

If so, it's easy:

user-agent: EmailSyphon
disallow: /

in robots.txt

problem fixed.

Skirril

mivox

WebmasterWorld Senior Member mivox us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 269 posted 9:26 pm on Jan 10, 2001 (gmt 0)

None of the visitors on that list logged any hits to robots.txt... I've noticed many spiders are unlikely to bother with manners unless they're a big player OR an .edu research spider.

steward

10+ Year Member



 
Msg#: 269 posted 8:56 pm on Jan 14, 2001 (gmt 0)

Mivox wrote:
" wha tthe heck is "Crescent Internet ToolPak HTTP OLE Control v.1.0"???

That is an activeX control which programmers use to write their own browsers or crawlers. Another ID like that is "Microsoft URL Control". Since hundreds or thousands of programmers might use that, each for a different program, some programs might be harvesters, some might be site grabbers for offline perusal, and some might even be do-it-yourself browsers.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved