My website has just been hit by some kind of distributed spider - it doesn't identify itself as a spider, in fact it's using several different user_agents and tries to pretend to be several people browsing (a form of cloaking!? ;), each one at a different IP, not all the IPs are from the same block - and it didn't honour my robots.txt either.
So why do I think it's a spider...?
Well, in the space of 20 mins the IPs in question systematically requested -/in perfect sequential order/- all of my top level categories, and then proceeded to go through the sub-cats and a whole load of product pages -- even hitting most of the 'email a friend' pages for these also (kind of gives away the fact it's a spider).
I imagine a networked cluster of PCs running on multiple dial-up accounts, ravaging the web for email addresses ..or something...
Anyway, the IPs and user_agents I observed were:-
220.127.116.11 Mozilla/4.77 [en] (X11; U; Linux 2.2.19 i686)
18.104.22.168 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
22.214.171.124 Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
126.96.36.199 Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)
I'd post some of my log so you could see the pattern of requests, but I know posting URLs (partial or otherwise) for commercial sites isn't allowed.
[Shrug] ..just thought I'd pass the info on, even if it's not really of much use to anyone.