Forum Moderators: open

Message Too Old, No Replies

similarpages dot com

New Nutch?

         

tangor

4:07 am on Mar 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



SimilarPages/Nutch-1.0-dev (SimilarPages Nutch Crawler; [similarpages.com;...] info at similarpages dot com)

The webpage is nothing. The IP is 75.101.224.xx which is another Amazon...

Wonder just how many of these we'll eventually see?

I've pretty much given up on Nutch... banned 'em all because half respect robots.txt the other half and two-thirds do not.

GaryK

10:22 pm on Mar 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It first visited me on 2/21/09 and I turned them away. It came back every day until 3/7. After that it seems to have finally given up.

Like you I ban anything with Nutch in it. It's weird how they always seem to read robots.txt, but then promptly ignore it.

dstiles

3:50 am on Mar 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just logged this bot at the following IP blocks, 30 hits in 30 minutes to 20 sites, half the hits to home page the rest not. All IPs are Amazon.

174.129.111.nnn (2 IPs)
174.129.112.nnn
174.129.118.nnn
174.129.181.nnn
174.129.81.nnn
174.129.85.nnn
174.129.86.nnn
174.129.87.nnn
67.202.33.nnn
67.202.56.nnn
67.202.58.nnn
67.202.61.nnn
72.44.36.nnn
72.44.37.nnn
72.44.38.nnn
72.44.59.nnn
75.101.200.nnn
75.101.215.nnn