... Well, returns to
me, anyhow.
Thread from winter 2011/2012 [webmasterworld.com]
Thread from 2013 [webmasterworld.com]
Some interesting changes, though obviously still the same critter.
IP: 206.253.226.12
UA: Mozilla/5.0 (compatible; oBot/2.3.1; +http://filterdb.iss.net/crawler/)
Behavior: visit began with robots.txt, immediately followed by front page, brief intermission, and then 60 further requests in 30 seconds for a steady average of 2 per second. (Like the Googlebot, they clearly do not know how to read the Crawl-Delay directive, but really, isn't no-more-than-one-per-second a pretty good rule of thumb?)
Now the fun begins. Nothing was requested from either of the roboted-out directories whose content is linked from the front page. So that's good. But they must be going for some kind of Robotic Stupidity prize, because for each of the six permitted directories, requests went:
/directory
(without trailing slash, leading to mod_dir redirect to)
/directory/
(i.e. the form the front page linked to in the first place)
and then
/images/something
/images/somethingelse
et cetera, each time referring to images whose actual URL is
/directory/images/etcetera
linked from directory-index pages in the form
images/etcetera
With all those misread subdirectory-images, it seems to have escaped their notice that there are also images linked directly from the front page, so the only /images/ they
didn't ask for were the ones that
do exist.*
I don't know if this is a domino effect whereby they think they're in the root (page at "/directory") when in fact they're in a subdirectory, or whether they just don't understand how relative links work. Net result: 62 requests, of which just 9 were successful. There was even a bonus redirect to an entirely different site, thanks to one subdirectory having the same name as a top-level directory in that other site.**
No, wait, I take it back: there was one robots.txt violation. Apparently you only have to read robots.txt for the site that contains the html; they also requested two piwik files which live in a roboted-out directory on a different site. (This, in turn, tells me that they paid a brief visit to yet a third site-- but were apparently so baffled by its link structure, they just grabbed a couple of images and quickly left.)
* "We carried away all that we did not catch, and all that we caught, we left behind."
** In fact I had to pore over my htaccess to find the redirect, since there's no earthly reason any human would ever request a file from Site B that has never existed anywhere but Site A. I just put the redirect there for insurance.