Forum Moderators: open

Message Too Old, No Replies

SheenBot

         

Pfui

10:52 pm on Nov 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ec2-174-129-75-209.compute-1.amazonaws.com
SheenBot/SheenBot-1.0.0 (Sheen web crawler)

robots.txt? Yes

(See also: amazonaws.com plays host to wide variety of bad bots [webmasterworld.com])

GaryK

11:22 pm on Nov 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I saw this one last week. Thought the best place to share it would have been that damned thread that just won't die [webmasterworld.com]! But I refuse to post in it anymore. I'm so sorry I ever posted in it. Wish there were some way to unsubscribe from a thread.

Anyway, it behaved as all bots from Amazon behave: badly!

Staffa

9:26 am on Nov 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I saw it twice yesterday but all amazonaws are banned so it got nothing.

mcneely

1:03 am on Nov 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ec2-75-101-242-106.compute-1.amazonaws.com

SheenBot/SheenBot-1.0.1 (Sheen web crawler)

Grabbed Robots.txt and main index.php and left.

Still not trusting the cloud that hangs over Amazon

keyplyr

9:30 am on Nov 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



SheenBot seems to be cloud hopping:

ec2-72-44-39-186.compute-1.amazonaws.com

same UA, same behavior

GaryK

6:17 pm on Nov 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I saw the UA mcneely posted just now:
SheenBot/SheenBot-1.0.1 (Sheen web crawler)
Also from Amazon.
Grabbed robots.txt plus default root page then left.

Pfui

8:18 am on Dec 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@keyplyr: From what I understand, AWS/EC2 customers' cloud addresses are neither static nor unique. For example, even the same UAs that hit the same files several times a day (Python, etc.) always hail from different Hosts. That's why I prefer blocking amazonaws.com, with additional, belt-and-suspenders blocking of the worst AWS CIDRs reported by y'all.

FWIW:

"Customers of Amazon EC2 may launch and terminate multiple instances over the course of a few hours which means that customers may occupy the same host at different times of the same day. ..." -Report Abuse of AWS Services [aws.amazon.com]

keyplyr

10:11 am on Dec 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Pfui thanks. Compared to some, I'm conservative when it comes to blocking. However I eventually saw no other alternative but to block ranges assigned to AWS.

jdMorgan

3:21 pm on Dec 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



... and Microsoft Azure, and all the rest of these 'cloud' server services.

I see a train wreck coming unless these services get with it and implement some kind of method so that real-time rDNS is immediately available on the current using "organization" of a specific cloud IP address in some simple programmatically-accessible form, and 'historic' IP assignment info is available for at least a week in both program-accessible and human-accessible (e.g. browser-accessible) form.

Otherwise, legitimate and non-problematic spiders and services that we might want to allow to access our sites will be blocked because we have to 'defend' our sites against cloud-based activity that the cloud provider might consider legitimate, but that Webmasters consider to be scraping or abuse.

Right now, I'm blocking AWS and Azure, but I don't know what I'll do if some service that I want to allow to access my sites "moves into the cloud" and uses a non-validatable user-agent string. Unless they implement UA authentication as Majestic12 has recently done, this forces a really uncomfortable choice.

I hear the distant whistle and feel the vibration of the tracks, I see that the bridge trestle is broken, but I have no idea what to do about it...

Jim