Forum Moderators: open

Message Too Old, No Replies

Mediapartners-Google? MimeLive Client? and others

What do you say about that spiders?

         

silverbytes

6:26 pm on Oct 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I got hited by some spiders lastly,
Suspicious list:

obot
Almaden.ibm.com/cs/crawler (IBM looking my site?)
Enterprise Search (?)
MimeLive Client (?)
[MSIE 6.0] ( Why so many spiders claiming to be Microsoft?)
MSNIA
MSIECrawler
NutchOrg (seems to be a SE open source.., friendly?)
Openfind data gatherer (asian bot but friendly too?)
Pompos (pompos@iliad.fr) (from france but nice or ban?)
libwww-perl (?)

I'm thinking in putting that to .httaccess

RewriteCond %{HTTP_USER_AGENT} ^abot [NC,OR]
and so on...

Comments please?

Jenstar

6:29 pm on Oct 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mediapartners-Google

That is the bot for targeting AdSense ads to your content. If you aren't running AdSense, it could have been from an Opera user visiting your site.

runarb

7:07 pm on Oct 15, 2003 (gmt 0)

10+ Year Member



libwww-perl (?)

A perl library widely used in spider applications. See [lwp.linpro.no...]

WebJoe

7:24 pm on Oct 15, 2003 (gmt 0)

10+ Year Member



In WebmasterWorld forum11 Updated and Collated Bot List [webmasterworld.com] some of the listed bots are mentioned.

Here for some reading:
abot: 241 [webmasterworld.com], 1297 [webmasterworld.com], 1450 [webmasterworld.com], 1616 [webmasterworld.com], 1762 [webmasterworld.com]
Almaden.ibm.com/cs/crawler: 128 [webmasterworld.com], 665 [webmasterworld.com], 759 [webmasterworld.com], 777 [webmasterworld.com], 800 [webmasterworld.com], 814 [webmasterworld.com], 904 [webmasterworld.com], 1442 [webmasterworld.com], 1536 [webmasterworld.com], 1705 [webmasterworld.com], 1992 [webmasterworld.com], 2197 [webmasterworld.com], 2276 [webmasterworld.com] (and those are just the Forum 11-posts)
MimeLive Client: 2137 [webmasterworld.com], 2147 [webmasterworld.com]
MSIE 6.0: 2354 [webmasterworld.com]
MSIECrawler: 749 [webmasterworld.com], 1010 [webmasterworld.com], 1398 [webmasterworld.com], 2270 [webmasterworld.com]
NutchOrg:1667 [webmasterworld.com], 2200 [webmasterworld.com], 4987 [webmasterworld.com]
Openfind data gatherer: 30 [webmasterworld.com], 877 [webmasterworld.com], 1054 [webmasterworld.com]
Pompos: 1448 [webmasterworld.com], 1604 [webmasterworld.com]
libwww-perl: 1885 [webmasterworld.com], 2160 [webmasterworld.com]

And my personal conclusion:
abot: I banned it, like Thomson&Thomson and Cyveillance. Services like that can't be serious if they obey robots.txt, and I don't allow spiders that do that.

Almaden.ibm.com/cs/crawler: I don't see a reason why to ban a well behaved bot from Big Blues research center

MimeLive Client: Belongs to Exalead(?), wouldn't know

MSIE 6.0: Banned it, has nothing to do with Internet Explorer but is some Rumanian Directory that doesn't obey robots.txt

MSIECrawler: Internets Explorer offline-browsing crawler. Obeys robots.txt and therefore not banned.

NutchOrg: Caught it not obeying robots.txt more than once and banned it subsequently.

Openfind data gatherer: I havent's the UA banned but pretty much all of 211. and 205. IP-ranges that are registered in APNIC. There's just too much e-crap coming from that part of the world

Pompos: Wouldn't know

libwww-perl: Banned it except a few exceptions, as "Microsoft URL Control" and "Indy Library", all one or another part of some dev-tool where pretty much anyone can "create" a bot (that of course ignores robots.txt)

BlueSky

8:15 pm on Oct 15, 2003 (gmt 0)

10+ Year Member



Enterprise Search: This one looks like a commercial search engine ($500) by Innerprise.net which can spider websites, intranets, and local drives. It does HTML, TXT, Word, Excel, Powerpoint, PDF, WordPerfect, and RTF documents. It supposedly supports robots exclusion. Can you post the Full UA? I'd like to ban this one.

MSNIA: MSN Internet Access perhaps? This one may be human.

wilderness

10:29 pm on Oct 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Almaden.ibm.com/cs/crawler: I don't see a reason why to ban a well behaved bot from Big Blues research center

Because they are using your resources and bandwidth to generate fees to third-party customers.

silverbytes

11:20 pm on Oct 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My conclusion: Pompos survived (for now) all rest off my site.

Thanks!