Welcome to WebmasterWorld Guest from 23.23.46.20

Forum Moderators: Ocean10000 & incrediBILL

ldspider

code.google.com

   
7:37 pm on Jan 27, 2012 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Just what we need -- another Google-related robot:

swsesrv16.deri.ie [projecthoneypot.org...]
ldspider (http://code.google.com/p/ldspider/wiki/Robots)

robtos.txt? Yes

Courtesy of the PHP link for the IP:

140.203.154.197's User Agent Strings
multicrawler (+http://sw.deri.org/2006/04/multicrawler/robots.html)

The bot-runner is a.k.a. deri.org, a.k.a. Semantic Web Search Engine a.k.a. SWSE, etc. Here's info about that multicrawler bot. [webmasterworld.com...]
10:51 pm on Jan 27, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



University of Ireland - University College, Galway. I have the IP registered here as Multicrawler: 140.203.154.100 - 140.203.154.199.

I currently have the bot enabled (probably as a result of the thread you linked to) but haven't seen it this month. How is it behaving? Badly?
11:53 pm on Jan 27, 2012 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Long story short: Jury's still out. It behaved in that it asked for robots.txt and when that was 403'd, it went for root.

Long-winded details: I usually generate a generic (blanket Disallow) robots.txt and make it available to all but a chosen few major SEs. As it turns out, I recently I 403'd that step re some hosts (e.g., code.google.com) that rarely ask for it, in order to cut down on needless rerererewriting. Now, if/when ldspider comes back, it'll be able to read the no-frills file after which I'll know if it heeds it.

(Sorry you asked?:)
 

Featured Threads

Hot Threads This Week

Hot Threads This Month