Forum Moderators: open

Message Too Old, No Replies

Wire

(a "Bot,Robot,Spider,Crawler")

         

Pfui

10:04 am on Sep 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



LOL. Why use one word when four will do?

crawler.ceptro.br
WIRE/1.0 (Linux;i686;Bot,Robot,Spider,Crawler)

robots.txt? NO

Actually, the Bot,Robot,Spider,Crawler is a new (or newly named) UA from "crawler.ceptro.br". The usual's been bare nekkid:

crawler.ceptro.br - - [10/Sep/2010:18:59:16 -0n00] "HEAD / HTTP/1.1" 403 0 "-" "-"

dstiles

7:07 pm on Sep 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



According to user-agents.org WIRE is "Web Information Retrieval Environment" crawler used by different IPs for different purposes, details at cwr.cl/projects/WIRE/ (looks like it's from Chile Uni).

Looks like an alternative to nutch?

GaryK

7:59 pm on Sep 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, but with a UA like that it's begging to be blocked! :)

keyplyr

8:15 pm on Sep 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Doesn't even need to beg :)

caribguy

7:37 am on Sep 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



+1 keyplyr

.br is on the shortlist anyway, right after .cn .kr .ru .ua and .pn