Forum Moderators: open
Sadly, many of the users don't further indentify the purpose of their usage by modifying the user agent details. Personally, I wish the author would alter Nutch so it won't even run unless you change the user agent.
You'll note a few actual business names in the list that you might find interesting, so take a look.
124.32.246.36 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)124.32.246.45 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
128.208.6.200 NutchCVS/0.7.1 (Nutch running at UW; [crawlers.cs.washington.edu...] sycrawl@cs.washington.edu)
128.208.6.226 NutchCVS/0.8-dev (Nutch running at UW; [nutch.org...] sycrawl@cs.washington.edu)
128.208.6.227 NutchCVS/0.8-dev (Nutch running at UW; [nutch.org...] sycrawl@cs.washington.edu)
128.208.6.77 NutchCVS/0.8-dev (Nutch running at UW; [nutch.org...] sycrawl@cs.washington.edu)
129.242.19.138 NutchCVS/0.06-dev (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
129.34.20.19 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
129.78.64.106 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
131.112.16.140 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
131.112.16.220 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
131.211.84.21 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
136.165.45.122 NutchCVS/0.06-dev (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
137.43.154.203 NutchCVS/0.06-dev (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
147.202.90.2 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
164.67.195.24 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
164.67.195.245 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
164.67.195.26 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
164.67.195.27 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
164.67.195.68 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
164.67.195.85 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
166.214.93.76 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
193.203.240.117 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
193.203.240.118 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
193.203.240.119 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
193.203.240.120 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
193.203.240.121 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
193.203.240.122 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
193.252.148.51 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
203.113.130.205 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
203.131.194.84 NutchCVS/0.7 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
203.147.0.44 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
203.244.218.1 NutchCVS/0.06-dev (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
209.131.61.1 NutchCVS/0.7 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
210.174.3.130 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
210.196.73.193 NutchCVS/0.06-dev (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
210.245.31.15 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
210.245.31.18 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
212.12.114.238 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
212.127.226.60 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
212.137.33.140 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
212.156.230.210 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
212.58.116.72 NutchCVS/0.7 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
213.132.175.101 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
213.186.36.107 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
213.251.133.12 Misterbot-Nutch/0.7.1 (Misterbot-Nutch; [misterbot.fr;...] nutch at misterbot.fr)
216.93.185.12 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
220.218.159.50 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
221.114.253.210 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
221.116.237.114 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
221.221.237.35 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
24.222.153.250 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
24.224.226.18 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
58.186.61.164 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
58.187.12.236 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
58.87.139.90 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
59.160.240.115 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
60.248.9.114 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
61.135.151.175 NutchCVS/0.06-dev (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
62.129.132.47 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
62.168.188.151 NutchCVS/0.7 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
62.40.36.87 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
63.133.162.98 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
64.105.36.210 NutchCVS/0.06-dev (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
64.151.112.44 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
64.241.242.18 NutchCVS/0.05 (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
64.242.88.10 NutchCVS/0.05 (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
64.242.88.60 NutchCVS/0.05 (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
64.34.172.78 BurstFind Crawler 1.0/0.7.1 (Nutch; [lucene.apache.org...] crawler@burstfind.com)
64.34.180.167 Nokia6620/2.0 (4.22.1) SymbianOS/7.0s Series60/2.1 Profile/MIDP-2.0 Configuration/CLDC-1.0/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
64.38.10.26 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
64.71.164.103 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
64.71.164.107 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
64.71.164.108 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
64.71.164.125 Krugle/Krugle,Nutch/0.8+ (Krugle web crawler; [krugle.com...] webcrawler@krugle.com)
65.220.67.9 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
65.9.20.49 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
65.91.114.3 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
66.108.32.4 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
66.15.68.234 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
66.162.5.43 NutchCVS/0.7 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
66.207.120.226 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
66.243.31.34 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
67.111.28.139 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
67.52.101.242 NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
68.205.124.164 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
68.205.127.94 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
69.248.26.83 Comrite/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
69.55.233.28 Argus/1.1 (Nutch; [simpy.com...] feedback at simpy dot com)
70.197.81.79 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
70.30.97.106 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
70.56.66.216 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
70.96.99.254 NutchCVS/0.7 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
71.241.153.125 NutchCVS/0.7 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
71.35.163.79 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
72.0.207.162 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
72.2.25.67 NutchCVS/0.06-dev (Nutch; [nutch.org...] nutch-agent@lists.sourceforge.net)
72.5.173.12 sdcresearchlabs-testbot/0.8-dev (www.shopping.com/bot.html; [lucene.apache.org...] researchbot@shopping.com)
72.51.37.148 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
81.203.142.109 NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
83.246.79.28 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
84.191.111.92 NutchCVS/0.7.1 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)
I stopped getting overly irked after deciding to 403 every Nutch user but the I know they're still ringing the bell. It would be SO NICE if bot-coders programmed their spawn to GO AWAY after, say, three 403s.
203.199.83.162
pro3.rediffmailpro.com
"NutchCVS/0.7.2 (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
RewriteCond %{HTTP_USER_AGENT} Nutch [NC,OR]
in the .htaccess file - has worked for me for years now :)
It's so easy to block Nutch right now with just one line of code. I'd hate to have to start playing the IP Address game with yet another user agent.
Though if we had a more private forum... we'd have less of a chance for 'them' to see:
[webmasterworld.com...]
I just see what's trying to get on, which is amusing to say the least.
One of these days I might even spot something useful and let it thru my automatic firewall but to date nothing has motivated me in that direction.
Just reporting whats out there as this the "Search Engine Spider Identification" forum ;)
Just reporting whats out there as this the "Search Engine Spider Identification" forum
...like trying to drum-up business for your Blog? :(
Have I been duped again?
I've read his blog as well... when it's secondary news (originally reported on WebmasterWorld)... your guess is as good as mine.
I, too, am a reader (when Blogspot cooperates), and now-twice Anon poster because I've run into related info. Actually, it appears a bunch of us tend to hang out in both places.
There's a different feeling over there -- aside from Bill's cuss words (if I had a dollar...) -- although that might be nausea from its current Mid-Century Mod puke-green decor;)
Now if only we could find a place to exchange data privately, away from the eyes of those we're trying to stop.
Bill doesn't need defending, but bobo, methinks you've got the wrong impression.
I'll be happy to agree... but can't seem to think there isn't a bit of sampling going on.
But I do agree about a private forum. :)
...like trying to drum-up business for your Blog?
I don't care if people read that or not really, stopping the bots is all that's important.
I will admit I've double posted more than a couple of times when I thought the data should reach a wider audience like 100+ recent instances of new bots using the same core crawler.
Problem with the moderated forum, although signal to noise ratio is better, speed to publish isn't which can be a tad frustration when new things show up.
But I do agree about a private forum.
No arguments here either as I've noticed the wrong people read when I post no matter where it is and I see them "upgrade" their wares fixing things I've used to identify them.
Have I been duped again?
Not unless you fell into a copying machine ;)
[edited by: incrediBILL at 11:09 pm (utc) on June 18, 2006]
No hard feelings... guess this is what it sounds like when doves cry :)
Stay tuned for a still-pending, four-bot post with details about the following, NONE of which ask for robots.txt:
- Dawang Version
- g3.pl (leaseweb)
- yoono
- SPIP (jujuscript)
Too bad mod Dan has a life when most of us, m'self included, might benefit from more of one... Weekend? Whazzat?
Anyway, and bringing this back around to the topic of this thread (heh):
No new Nutch sightings.