Forum Moderators: open
ec2-[yada-yada].compute-1.amazonaws.com
LargeSmall Crawler (LargeSmall; [onespot.com;...] info@onespot.com)
robots.txt? Yes
(See also: " amazonaws.com plays host to wide variety of bad bots [webmasterworld.com]")
Formerly:
prod-crawler-1.largesmall.com
LargeSmall Crawler
dev-app-1.largesmall.com
LargeSmall Crawler
Prior versions/hosts did not ask for robots.txt. Nice that it does now seeing as how OneSpot aggregates and sells what it crawls.
In the following still-active, just-hit-me version crawling from its own Host, LargeSmall still does NOT request robots.txt:
prod-crawler-1.largesmall.com
LargeSmall Crawler
robots.txt? NO
Whoa. onespot.com and largesmall.com have numerous same-content pages. Can you say, "Duplicate content [google.com]"?
I guess its response to not liking the fact that it's Disallowed (or to not understanding a particular robots.txt file) is to simply fetch it again... and again... and again... Another badly-broken 'bot.
Jim