Forum Moderators: open

Message Too Old, No Replies

NetcraftSurveyAgent from new Amazon EC2 range

didn't read robots.txt

         

thetrasher

12:23 pm on Dec 12, 2008 (gmt 0)

10+ Year Member



IP: 174.129.134.nn
User-Agent: Mozilla/5.0 (compatible; NetcraftSurveyAgent/1.0; +info@netcraft.com)

174.129.0.0/16 = AMAZON-EC2-5

GaryK

4:10 pm on Dec 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It paid one of my sites a visit last week. It hit the default root page an excessive number of times in one minute before it got the boot.

I'm getting ready to track down and ban the entire EC2 netrange. Would any of you guys happen to know that information already?

GaryK

4:38 pm on Dec 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, Don. ;)

blend27

5:13 pm on Dec 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



[ws.arin.net...]

The ones that have the EC2 in the NETNAME

incrediBILL

10:39 pm on Dec 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Has anything other than nutch that originates from the Amazon range ever read robots.txt except to ignore it?

Just wondering out loud...

GaryK

10:46 pm on Dec 14, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



blend, that's the same link Don sent me via StickyMail. :)

Bill, are you implying Nutch reads robots.txt?

wilderness

12:15 am on Dec 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



blend, that's the same link Don sent me via StickyMail.

Our site host and/or moderator get touchy about such broad links, thus, I thought a sticky the best protocol.

incrediBILL

4:54 am on Dec 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Arg, you pranksters got me.