Managing my own domains plus several domains which are independent of my own does give an advantage for spotting new, questionable, or bad bots.
Recently I found my site and several other (isolated and independent) domains had the same log entries for IP number and user agent. For each site this bot grabbed every page from each site.
My research of the IP and its group (and provider) revealed what I consider "concealed identity" wherein the registrar did not give owner names and lookup by address did not give any company name or individual name. I also went to the domain name(S) associated with the IP and it had a Flash page with no alternative content (I deliberately have Flash uninstalled... never use it for many reasons).
Because of "lack of information" on the owner of the IP cidr group (provider of service to the bots IP) and no reverse dns on the individual IP and not much else, plus with it grabbing all pages, I blocked the entire cidr group plus the user agent.
The IP number was 18.104.22.168 and UserAgent was Java/1.5.0_06
For the UserAgent, G and other SE's showed it was a plugin for some browsers.
The cidr range 22.214.171.124/18 of 126.96.36.199 - 188.8.131.52 belongs to
slfiber.com in Alabama. Search of G by address shows Harbor Communications LLC in Mobile AL and where I have found that a huge amount of spam originates from southern states I would sooner block the entire group (16k). I could be a valid new SE but I did not submit to them and would sooner protect my site and those I manage.
Every page grabbed from multiple independent domains is NOT right.
Just a heads-up to watch for the IP and UA.