Forum Moderators: open

Message Too Old, No Replies

80legs IPs

to post or not to post

         

incrediBILL

5:46 pm on Nov 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've collected more than 2K unique IPs from 80legs and I'm wondering whether I should or shouldn't post them considering they are primarily residential IPs hijacked by this software and removing the software removes the IP from the 80legs network.

Obviously people can block it by user agent:
Mozilla/5.0 (compatible; 008/0.83; http://www.80legs.com/webcrawler.html;) Gecko/2008032620

But that assumes you trust something hijacking bandwidth from customers to never cheat on the user agent being sent.

Thoughts?

wilderness

6:02 pm on Nov 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



black-listed "crawler" kills the current UA.

Your own "http:" white-list also kills the UA.

incrediBILL

6:20 pm on Nov 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's true, but I'm thinking about blocking the IPs directly because we can never be sure this beast won't decide to cheat and not use that UA.

dstiles

10:11 pm on Nov 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I put a disallow in robots.txt for the real crawler and to give it credit it's so far obeyed it. I haven't conciously seen any rogues that disobey robots.txt but I do have a block in the system just in case. It's possible any rogues are using IP ranges (eg server farms) that I already block so I would only notice that if it was noticeable, if you see what I mean. :)

keyplyr

11:17 pm on Nov 24, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I had a deny in robots.txt for a couple years and 80legs always obeyed it until a couple months ago.

80legs is also blocked in htaccess so the weekly attempts are all 403'd but it's interesting to see this change in its behavior. It never requests robots.txt any longer.

incrediBILL

7:44 am on Nov 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It never requests robots.txt any longer.


80legs still requests robots.txt from me all the time, I wonder if you're already seeing trojans versions like all the fake Majestics we had a couple of years ago.

keyplyr

12:01 pm on Nov 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well wouldn't ya know it - as soon as I opened my mouth... along came 80legs requesting robots.txt :)

lucy24

7:07 am on Nov 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The most recent previous thread [webmasterworld.com] (started exactly a year ago!) led me to, I guess, the definitive 80legs thread [webmasterworld.com] (started July 2009).

:: wandering off to raw logs to re-check "no skin off my nose" verdict ::