homepage Welcome to WebmasterWorld Guest from 23.22.194.120
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
What is Indy library and can I get them to go away
korkus2000




msg:407228
 5:38 pm on Jun 10, 2002 (gmt 0)

Mozilla/3.0 (compatible; Indy Library) this hits all my servers all the time. What is it and does it respect robots.txt? This thing is a bandwidth hog!

I found on another site that it is "Internet Direct Library for Borland (used as E-Mail collector)." So what can I do opposed to banning the ip?

[edited by: korkus2000 at 5:41 pm (utc) on June 10, 2002]

 

toolman




msg:407229
 5:41 pm on Jun 10, 2002 (gmt 0)

Lemme guess...it's coming from an ip in Asia and it voraciously eats every thing it can find on the server?

[webmasterworld.com...] this is the best answer to that problem that I know of.

misosoph




msg:407230
 3:32 am on Jun 11, 2002 (gmt 0)

toolman, does adding this to an .htaccess file block Indy Library -- that is, is the ^Mozilla.*Indy part correct?

SetEnvIf User-Agent ^Mozilla.*Indy keep_out
order allow,deny
allow from all
deny from env=keep_out

korkus2000, below is explicitly the answer you don't want. It has worked for me so far, however. And I have never had any requests from these two IP addresses that did not involve Indy Library (I watch my access logs for 403s very carefully on this account).

deny from 210.82.
deny from 211.101.

Both of these are Mozilla/3.0 (compatible; Indy Library) from Beijing, China

toolman




msg:407231
 5:53 am on Jun 13, 2002 (gmt 0)

>>>toolman, does adding this to an .htaccess file block Indy Library -- that is, is the ^Mozilla.*Indy part correct?

Ahhh. You're asking the wrong dude. I just collected a bunch of ua's from other threads and stuck them together. Littleman or Air would be the ones to help on mod_rewrite questions.

Key_Master




msg:407232
 6:19 am on Jun 13, 2002 (gmt 0)

misosoph,

This will ban that agent. No need for a Mozilla prefix. This will ban any agent that contains the phrase Indy Library, in both upper and/or lower case (I like to be on the safe side).

SetEnvIfNoCase User-Agent "indy library" keep_out

misosoph




msg:407233
 7:21 am on Jun 13, 2002 (gmt 0)

Thank you. The part about NoCase is especially useful to me.

(I never know whether to write "Thank you" notes. On the one hand, no one learns anything from reading them. And on the other hand, it looks as if you are unappreciative or ignoring the responder if you don't write one. So what is to be done?)

misosoph




msg:407234
 10:42 am on Jul 14, 2002 (gmt 0)

For the record: I wrote above that I had never received a request from IP address 210.82. that did not have Indy Library as the user agent. That is no longer true:

210.82.42.242 - - [13/Jul/2002:23:54:36 -0700] "GET /folder/filename.html HTTP/1.1" 403 302 "http://www.google.com/search?hl=iw&inlang=iw&ie=ISO-8859-8-I&q=searchword+searchword+searchword+searchword&btnG=%E7%E9%F4%E5%F9+%E1%E2%E5%E2%EC" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)"

APNIC Whois says 210.82.0.0 - 210.82.127.255 is registered to the "beijing branch" of china-netcom.com

I forgot to remove " deny from 210.82. " from my .htaccess file when I added " SetEnvIfNoCase User-Agent "indy library" keep_out ", and now I've blocked someone through carelessness.

wilderness




msg:407235
 12:47 pm on Jul 14, 2002 (gmt 0)

misosph,
I've been using 210.82.124. for that Indy which has been effective. At least between that and the other indy #.

If it's any comfort?
We are after all only human!

The visitor was looking for the frequency of keywords. Although they specified "searchword."
Keywords with only a few exceptions are ineffective with todays SE's. Though that doesn't stop me from creating them off the content of each page.

wilderness




msg:407236
 1:20 pm on Jul 14, 2002 (gmt 0)

It seems as though the Indy Library user monitors this group :-(

210.82.124.87 - - [14/Jul/2002:06:06:09 -0700] "GET / HTTP/1.0" 403 - "-" "Mozilla/3.0 (compatible; Indy Library)"
210.82.124.85 - - [14/Jul/2002:06:06:09 -0700] "GET / HTTP/1.0" 403 - "-" "Mozilla/3.0 (compatible; Indy Library)"
210.82.124.86 - - [14/Jul/2002:06:06:09 -0700] "GET / HTTP/1.0" 403 - "-" "Mozilla/3.0 (compatible; Indy Library)"
211.101.236.91 - - [14/Jul/2002:06:06:09 -0700] "GET / HTTP/1.0" 403 - "-" "Mozilla/3.0 (compatible; Indy Library)"

misosoph




msg:407237
 1:54 pm on Jul 14, 2002 (gmt 0)

Thank you, wilderness.

Maybe I mislead by substituting "searchword" for the actual words? It was a real search, of the form "map+of+north+dakota".

<quote> We are after all only human! </quote> Are you sure? Remember, this is the Internet. I might be a human or I might not be. :) But thank you for the thought!

bird




msg:407238
 2:03 pm on Jul 14, 2002 (gmt 0)

I get Indy Library requests from addresses all around the world, so I would assume that it is pointless to block them by IP. You will only hurt countless innocent bystanders if you do this.

Since the address harvesting tool using that library doesn't seem to be written with the ability to change the UA, blocking that looks like the preferrable method.

wilderness




msg:407239
 2:37 pm on Jul 14, 2002 (gmt 0)

<snip>I might be a human or I might not be.>

Misosoph,
The computer industry for some decades has been trying to inject a "human personal instict" in computers.
Although they have come along way in analyzing situations nothing replaces or even compares to the logic and feeling of another mind and heart.
Unless it's a similar weak mind or heart ;-)[TIC]

hey bird,
As I've made clear on more than one occassion! The methods I use are specific for my sites and should be determined by the market each website serves.
EX:
This past week I put up an equine sale in Michigan online. Yesterday a visitor from Yugoslavia wasted much unneccessary bandwidth by viewing every pedigree (family tree) for a Michigan Sale. 403.
I did however confine my 403 to xxx.xxx.xxx. which somewhat limits innocence.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved