-- Search Engine Spider and User Agent Identification
---- Don't stop fearing the webreaper
lucy24 - 5:26 am on May 3, 2013 (gmt 0)
I have seldom been so angry in my life.
When you take a quick look at your logs to make sure the latest htaccess edit hasn't had ::cough-cough:: unintended consequences, and find the access log at midday fully five times as fat as normal, it might be a good sign.
:: insert chorus of "I'm always a cockeyed optimist" here ::
When the accompanying error log is similarly almost as fat as a normal day's entire access log, it is definitely not a good sign.
:: cut details of long and exhaustive investigation ::
User-Agent for three occurrences of collecting robots.txt: WebReaper v10.0 - www.webreaper.net
User-Agent for two thousand, seven hundred forty-nine occurrences of ignoring robots.txt: WebReaper [firstname.lastname@example.org]
:: further detour to offending robot's www site, using Safari with fake UA ::
Q: Some sites result in an "Access denied" error in WebReaper. How can I download them? Unfortunately, you can't. WebReaper obeys the internet Robots Exclusion Standard
This is a, um, uh... Dang! Can't think of the word, although I'm pretty sure it's only three letters.
So why am I so angry? Above and beyond the fact that this is the single biggest robot attack I have ever sustained-- I don't believe I even have 2749 files (of all kinds) on my site-- close study of the beginning of the visit reveals that it was triggered by someone I know. Not face-to-face personally, or I'd go over and tear their head off, but online. And their actual target-- this is where the personal knowledge comes in-- consisted of not 2700+ but seven html files plus css and images.
First stop: Forum I share with nameless offender, where I post a curt character assassination benefiting from forum's inexplicable lack of word censoring.
Second stop: Here to remind everyone. If you haven't met WebReaper in a while and are thinking of commenting-out the block-- don't do it just yet.