homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Brett's IP lists
maybe some corrections needed..

 10:21 pm on Sep 6, 2000 (gmt 0)

Hi, Brett, I was looking at your IP lists pages and comparing to some of my notes, very complete, excellent job!
There are few things however that maye need to be looked at,
here you go..

Altavista: and
these two IP are highly suspect I don't think they belong on the list

Excite: and
suspect numbers again, one is the same IP as in AV above
you also did not mention that Excite spiders use libwww-perl/5.47 and libwww-perl/5.10 generic User Agent names

Lycos: lycosinc.NorthRoyalton.cw.net - hmm???

Fast: definitely suspect ah-ha.com powered by Fast but... suspect




 10:36 pm on Sep 6, 2000 (gmt 0)

Thanks. I'd not uploaded a rechecked list from this morning yet. The alta and excite ones are fixed and uploaded. The lycos I am pretty sure is lycos. Some of the other stray Lycos ip's walk an almost identicle trace route. Thanks on the Fast ones, I'd thought those were all correct, but obviously a couple of stray ones in there.

The looksmart ones where thrown in there because 'they were there'. That really is Looksmart who uses Fast's spider and search engine programs.

updated lists uploaded...


 3:07 am on Sep 8, 2000 (gmt 0)

Any URL where I can find this magical list? Thanks


 5:08 am on Sep 8, 2000 (gmt 0)



 2:05 pm on Sep 8, 2000 (gmt 0)


Thanks for that.

Smokin Joe

 7:57 pm on Sep 8, 2000 (gmt 0)

I have a list that my company bought from Fantomaster which is insanely big compared to the list you posted brett.

Are some of those useless... redundant... obsolete?

I'm confused as the the size of my database is swelling.

What I'm saying in short, is that I'd rather keep my database small and if your listing is accurate enough to keep me off the SE's poo poo list I'd be estatic.


 8:04 pm on Sep 8, 2000 (gmt 0)

Ralph does a great job with that list from what I've seen and heard. He is into it, and is covering EVERYTHING though. I'm finely targeted there on the majors only. If you go into some of the 'machine name' links under my lists, you will also find some HUGE lists (inktomi), that list everthing under the sun. The primary smaller lists are the ones that have been caught 100% known for sure as running spiders from that host.

I'd just shorten the list down to the ip's of engines you know you want to target. My own list is a bit bigger than the one I put online - there are some alta boxes that are questionable origin not listed, and I won't cloak for Excite, or Fast. So for me, that leaves primarily Ink, Alta, and some tricky link work with Google. That comes out a pretty small list (100-125 I think).

I tell you what, you just send me that fancy list and I'll trim it right up for you (lol - only kidding).


 3:50 am on Sep 18, 2000 (gmt 0)

dumb question from amateur:
How can I view my log files?
Thanks in advance to anyone who can


 4:36 am on Sep 19, 2000 (gmt 0)

Normally the web host gives you access to the raw server logs and/or access to stats derived from your logs. It is best to have access to the raw logs, that way you can determine what you want to see, rather then what the host decided to set up with their log stats program.

Not all hosts offer access to your logs, or only do so on certain price plans. Ask them about this.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved