homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Mercator spider

 3:34 pm on Sep 25, 1999 (gmt 0)


This little spider just hit me today (Have not seen it before):

Agent name: MERCATOR-1.0

Says she's from:


Im especially interested in this: (from the page above)

"One important interface allows new modules to be written to fetch documents using different network protocols, such as HTTP, FTP, and Gopher"

And this:

"Although the web contains a finite number of static documents (i.e., documents that are not generated on-the-fly), there are an infinite number of retrievable URLs. Three frequent causes of this inflation are URL aliases, session IDs embedded in URLs, and crawler traps. We have developed techniques to overcome some of those problems, but more innovation will be required, especially to recognize and avoid intentional crawler traps."

Anybody getting hit's for dynamic pages?



 5:25 pm on Sep 25, 1999 (gmt 0)

About 6 months ago, they hit one of our
sites completely. It was on the old isp and I didn't have access there to robots.txt. So, I whipped up a perl script to "toy" with them (letting stuff timeout, sending them junk - that sort of them). They came back a week later and hit everything in site with a slightly different user agent (including cgi's). I whipped up another script that was a dynamically created loop off .htm files (create a file, pull a file, create a file...). It sat there pulling that page loop for a full 20 hours. They've kept coming back and back. Evidently they think of me as a test site now. They routinely walk down the tree of all those dynamic graphics pages and most are form style posted urls once you get two pages deep.


 9:32 pm on Sep 25, 1999 (gmt 0)

The pages it's hitting for me is also dynamic - I had been toying around with the index.asp page at my site, letting it write out the date, referer, user_agent and stuff in a comment tag, maybe thats a thing that makes Marcator curious....

I have been hit by a ton of new spiders today, dunno why but here's the bunch:

User-Agent: WWW-Collector-E/0.10970
Via: 1.0 ccuproxy1.ccu.edu.tw:3128 (Squid/2.2.STABLE4), 1.0 cache.ccu.edu.tw:3128 (Squid/2.2.STABLE4)
Cache-Control: max-age=259200


User agent: KANSMEN
No rererer

No referer
resolves to: virtualpromote.com
(Now theres a funny one eh..., good ole Jim stopping by,
maybe his logs made him curious ;)

No user agent
No Referer
Resolves to: nat7.aitai.ne.jp
(this one is an old frequent visitor)
No Referer
Resotlves to: oracle.vcommunities.com


 12:42 pm on Sep 27, 1999 (gmt 0)

Found out what this one was:

No user agent
No Referer
Resolves to: nat7.aitai.ne.jp

Got spammed today, good attempt at forging headers incl ref's to
msn and hotmail, but some detective work got me aitai.ne.jp

So this appears to be an email harvester...


 3:57 pm on Sep 27, 1999 (gmt 0)

I have just about given up on email harvestors. By the time you figure out who is who, its too late. Except email siphon, its fun to *play* with them.

Where can I find an up-to-date list of known spammer email address's? There used to be a list ciruclated on usenet, but I don't see it anymore. I like to push spammer emails at email siphon.


 8:08 pm on Sep 27, 1999 (gmt 0)

Hi Brett,

Dont know where to find it either

Maybe you can find something at CAUCE somewhere, they had
a bunch of links to anti spam sites last time i visited.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved