Forum Moderators: open
# Genealogy SE [GenDoor.com...]
User-agent: GenCrawler
Disallow: /
# [gozilla.com...]
User-agent: Go!Zilla
Disallow: /
# Bad Links
User-agent: Linkidator
Disallow: /
# Bad Links [elsop.com...]
User-agent: LinkScan
Disallow: /
# Link Checking [linkguard.com...]
User-agent: LinkGuard
Disallow: /
EmailWolf
ExtractorPro
Crescent
CherryPicker
webbandit
WebBandit
NICErsPRO
Telesoft
EmailCollector
Leech
Net Vampire
ImageCrawler
Whizbang
SearchExpress Spider
Wget
Web Magnet
WebReaper
Mister PiX version.dll
InterGet/1.39
ru-robot
Nutella/9.0
ESISmartSpider
xxxbot1
LEIA/2.90
Internet-Html-Searcher/1.15 (064)
WebStripper/1.23
eSense (Chimera); Mozzila/4.0 (Compatible); www.vigil.com/esensedisclaim.html
WebSauger 1.20b
beholder www.vigiltech.com/esensedisclaim.html)
SilentSurf/1.1x [en] (X11; I; $MyVersion)
>I notice that spider, too. I donīt know where it comes >from, but I would guess it spiders the livesearches of >another engines.
oLeon, do you think that is what is going on here ?
195.121.6.106 - - [28/Mar/2001:06:35:55 -0500] "GET / HTTP/1.1" 200 9693 "http://195.121.7.86/cgi-bin/zoeken/avsearch.cgi?pg=q&q=border+terrier&kl=XX&what=web&stq=10" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"
195.121.6.106 - - [28/Mar/2001:06:35:59 -0500] "GET /images/film.jpg HTTP/1.1" 200 5911 "http://www.champdogs.co.uk/html/home.html" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"
212.78.177.71 - - [28/Mar/2001:06:36:00 -0500] "GET /images/film.jpg HTTP/1.0" 200 5911 "-" "MIIxpc/4.2"
195.121.6.106 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search_menu.htm HTTP/1.1" 200 1815 "-" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"
195.121.6.106 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search.htm HTTP/1.1" 200 824 "http://www.champdogs.co.uk/html/master_menu.htm" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"
212.78.177.71 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search.htm HTTP/1.0" 200 824 "-" "MIIxpc/4.2"
212.78.177.70 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search_menu.htm HTTP/1.0" 200 1815 "-" "MIIxpc/4.2"
It followed the surfer right round my site, taking the identical pages including the graphics.
In the case of client based stuff such as email harvesters (where the IP address varies), I'd recommend either
1) protecting your email addresses (for example, using unicode)
2) using a good crawler detection script to block them altogether