Forum Moderators: open

Message Too Old, No Replies

Anybody care to add any bots to this?

         

BoneHeadicus

6:53 pm on Mar 2, 2001 (gmt 0)

10+ Year Member



User-agent: *
Disallow: /css/
Disallow: /js/
Disallow: /logs/

User-agent: asterias
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: DIIbot
Disallow: /

User-agent: LinkWalker
Disallow: /

Son_House

8:17 pm on Mar 2, 2001 (gmt 0)

10+ Year Member



# Copyright Infringement [baytsp.com...]
User-agent: BaySpider
Disallow: /

# Genealogy SE [GenDoor.com...]
User-agent: GenCrawler
Disallow: /

# [gozilla.com...]
User-agent: Go!Zilla
Disallow: /

# Bad Links
User-agent: Linkidator
Disallow: /

# Bad Links [elsop.com...]
User-agent: LinkScan
Disallow: /

# Link Checking [linkguard.com...]
User-agent: LinkGuard
Disallow: /

luckynh

8:41 pm on Mar 2, 2001 (gmt 0)

10+ Year Member



User-agent: googlebot
Disallow: /cgi-bin/

User-agent: Gulliver
Disallow: /cgi-bin/

User-agent: Scooter
Disallow: /cgi-bin/

User-agent: EmailSiphon
Disallow:

Son_House

8:36 am on Mar 3, 2001 (gmt 0)

10+ Year Member



# Pictures [picsearch.org...]
User-agent: psbot
Disallow: /

chi

3:30 pm on Mar 8, 2001 (gmt 0)



This is my first post here: hi all!
...so sorry for my stupid answer: why do you want to ban these spiders from sites or site directories?

BoneHeadicus

3:51 pm on Mar 8, 2001 (gmt 0)

10+ Year Member



Hi chi.

These robots are what you might call "special purpose" robots that serve functions other than indexing for usable search engines. Banning these spiders in no way affects your ranking in legitimate search engines.

msgraph

3:55 pm on Mar 8, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



EmailSiphon

EmailWolf

ExtractorPro

Crescent

CherryPicker

webbandit

WebBandit

NICErsPRO

Telesoft

EmailCollector

Leech

Net Vampire

ImageCrawler

Whizbang

SearchExpress Spider

Wget

Web Magnet

WebReaper

Mister PiX version.dll

InterGet/1.39

ru-robot

Nutella/9.0

ESISmartSpider

xxxbot1

LEIA/2.90

Internet-Html-Searcher/1.15 (064)

WebStripper/1.23

eSense (Chimera); Mozzila/4.0 (Compatible); www.vigil.com/esensedisclaim.html

WebSauger 1.20b

beholder www.vigiltech.com/esensedisclaim.html)

SilentSurf/1.1x [en] (X11; I; $MyVersion)

icehousedesigns

6:48 pm on Mar 9, 2001 (gmt 0)



WebZip

mark_roach

9:32 pm on Mar 13, 2001 (gmt 0)

10+ Year Member



I don't currently block anyone. However after blowing my bandwidth limit last month I intend to start doing so now.

Does anyone know why I shouldn't add these 3 to the list ?

JennyBot
MIIxpc
teoma_agent3

Machiavelli

10:09 pm on Mar 13, 2001 (gmt 0)



User-agent: Googlebot
Disallow: /

mivox

10:34 pm on Mar 13, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why would you want to ban Google from your entire site?

Air

10:01 pm on Mar 15, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>MIIxpc

Not absolutely sure, but there is some indication that this may be altavista.nl or altavista.de, can anyone confirm?

oLeon

3:51 pm on Mar 16, 2001 (gmt 0)

10+ Year Member



Air,
I notice that spider, too. I donīt know where it comes from, but I would guess it spiders the livesearches of another engines.

volatilegx

11:26 pm on Mar 27, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Isn't this for the robots.txt file? Am I missing something? Most of these robots would never check the robots.txt file, right?

Dan

mark_roach

1:42 pm on Mar 29, 2001 (gmt 0)

10+ Year Member



>MIIxpc

>I notice that spider, too. I donīt know where it comes >from, but I would guess it spiders the livesearches of >another engines.

oLeon, do you think that is what is going on here ?

195.121.6.106 - - [28/Mar/2001:06:35:55 -0500] "GET / HTTP/1.1" 200 9693 "http://195.121.7.86/cgi-bin/zoeken/avsearch.cgi?pg=q&q=border+terrier&kl=XX&what=web&stq=10" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

195.121.6.106 - - [28/Mar/2001:06:35:59 -0500] "GET /images/film.jpg HTTP/1.1" 200 5911 "http://www.champdogs.co.uk/html/home.html" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

212.78.177.71 - - [28/Mar/2001:06:36:00 -0500] "GET /images/film.jpg HTTP/1.0" 200 5911 "-" "MIIxpc/4.2"

195.121.6.106 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search_menu.htm HTTP/1.1" 200 1815 "-" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

195.121.6.106 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search.htm HTTP/1.1" 200 824 "http://www.champdogs.co.uk/html/master_menu.htm" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

212.78.177.71 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search.htm HTTP/1.0" 200 824 "-" "MIIxpc/4.2"

212.78.177.70 - - [28/Mar/2001:06:36:12 -0500] "GET /html/search_menu.htm HTTP/1.0" 200 1815 "-" "MIIxpc/4.2"

It followed the surfer right round my site, taking the identical pages including the graphics.

ulstrup

12:27 pm on Oct 22, 2001 (gmt 0)

10+ Year Member



I know most of them are listed in previous posts, just thought I would share this:
[home.tvd.be...]

Will

2:27 pm on Oct 22, 2001 (gmt 0)



I'd look at blocking access based on IP address/hostname for many of the above, because many crawlers do not look at robots.txt (especially the lesser known bots).

In the case of client based stuff such as email harvesters (where the IP address varies), I'd recommend either

1) protecting your email addresses (for example, using unicode)
2) using a good crawler detection script to block them altogether