homepage Welcome to WebmasterWorld Guest from 54.196.159.11
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
ldspider
code.google.com
Pfui




msg:4411606
 7:37 pm on Jan 27, 2012 (gmt 0)

Just what we need -- another Google-related robot:

swsesrv16.deri.ie [projecthoneypot.org...]
ldspider (http://code.google.com/p/ldspider/wiki/Robots)

robtos.txt? Yes

Courtesy of the PHP link for the IP:

140.203.154.197's User Agent Strings
multicrawler (+http://sw.deri.org/2006/04/multicrawler/robots.html)

The bot-runner is a.k.a. deri.org, a.k.a. Semantic Web Search Engine a.k.a. SWSE, etc. Here's info about that multicrawler bot. [webmasterworld.com...]

 

dstiles




msg:4411700
 10:51 pm on Jan 27, 2012 (gmt 0)

University of Ireland - University College, Galway. I have the IP registered here as Multicrawler: 140.203.154.100 - 140.203.154.199.

I currently have the bot enabled (probably as a result of the thread you linked to) but haven't seen it this month. How is it behaving? Badly?

Pfui




msg:4411748
 11:53 pm on Jan 27, 2012 (gmt 0)

Long story short: Jury's still out. It behaved in that it asked for robots.txt and when that was 403'd, it went for root.

Long-winded details: I usually generate a generic (blanket Disallow) robots.txt and make it available to all but a chosen few major SEs. As it turns out, I recently I 403'd that step re some hosts (e.g., code.google.com) that rarely ask for it, in order to cut down on needless rerererewriting. Now, if/when ldspider comes back, it'll be able to read the no-frills file after which I'll know if it heeds it.

(Sorry you asked?:)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved