Forum Moderators: open
[Wed Aug 28 08:13:39 2002] [error] [client 63.148.99.229] client denied by server configuration: /usr/local/etc/httpd/htdocs/http&http&http&
the key was every attempt added another http& to the previous line, forming a kind of pyramid in my log files that went on for - oh - a few hundred lines and spanned acres and acres to the right of my screen. (On every domain, of course.) Needless to say, it became tedious after the first few hundred. After a few thousand I actually got kinda p*ssed off.
I don't think it's healthy for me to internalize such feelings of hostility, so I guess I might "reach out and touch someone" at Cyveillance. By email, of course.
I blocked by .htaccess, which still logged all the garbage lines in the error log. Logs, plural, actually.
Cyveillance was back the next day after sending them a letter notifiying them of what their spider was doing. Do they care? Of course not. After all, they have information to parse and sell. So what if they're getting it from me at my trouble and expense, right? Gee, I dunno - last time I checked a few thousand 403's in the face pretty much meant, "Get out and don't come back. Ever. And you better check under your spider's hood because it can't understand GET OUT."
Alas, subtlety is lost on them.
I asked my server techs to "firewall them out", and they were more than happy to do so. No more 403's, no more data miners from Cyveillance looking up my skirt to see what horrible things I might be up to. (Just another drop in the bucket and they're on to harass the next IP who HASN'T blocked them.)
It probably would not be a good idea for me to ever attend a conference where people from Cyveillance discussed "ideas" and droned on about "information sharing" the necessity of their "service". I'd incite a riot. Yup, there'd be me with my "idiotgirl" name tag on being hauled out by the big Samoan guys.
That was the way I helped set up a system, the problem is that from what I recall, they have a group of IP's (or did) and will change their user agent.
So if you deny them by robots.txt, which they will only respect till the next time when they return with a new user agent, it becomes a never ending project for you.
Do something at the server level, .htacces, or similar. Firewall sounds good too.
During it's last drive-by the UA was a non-descript:
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
which isn't necessarily always the calling card of a ruthless intruder. It's one reason I check my log files closely. Things aren't always what they appear. I think it's the combination of UA and IP behavior that indicates a pattern to be cautious about with any bot. In this case, it was quite obvious who was visiting because of the hundreds and hundreds of attempts (thousands) over a couple of days. (They weren't exactly in "stealth" mode. Mo-rons.)
Since they won't give up after my server responds repetitively with 403-Forbidden, I was thinking about finding the largest, slowest-loading, most CPU-time-intensive page on their corporate web site, and redirecting all their requests to that page. :)
I wonder if their UA would follow a 301? Probably not, and probably against my hosting service's TOS, but it was an entertaining thought anyway...
Yeah, mo-rons is right!
Jim
But I have written a simple little script for trapping (choking) spiders that follow forbidden links seeking email addresses :) I named it after my pet bird, since she used to stay up late at night with me while I parsed logfiles. In her memory, of course. I thought she'd appreciate it.
Be nice if I could think of a way to get the spider monkeys off my back in a similar fashion.