keyplyr

msg:4498636 | 5:46 pm on Sep 22, 2012 (gmt 0) |
It may not "done anything to cause serious annoyance" in your logs but what is it doing with the data it mines from your site? And what do the guys that purchase your data do with it?
|
lucy24

msg:4498692 | 11:29 pm on Sep 22, 2012 (gmt 0) |
"My data" may be putting it strongly ;) As of yesterday they've requested: -- two copies of robots.txt -- two directories in the form /name/index.html with no follow-up of resulting 301 -- two .sit (StuffIt) files of Mac games, datestamped 2004 but really at least 5 years older -- two ditto, only these are patches for game files that they haven't got -- one homemade MiSTing of similar vintage -- two further pages also dating from around 2005 -- one random gallery page -- one full-size jpg linked from a different gallery page :: further detour to previous batch of visits in June :: -- three requests for robots.txt -- one for front page -- two for one of the same directories as above-- only this time called correctly /name/ even though, ahem, I wasn't redirecting "index.html" at the time -- three requests for different directory, three of them stopping short at directory-slash redirect for form /name -- three for a different MiSTing, probably left over from when I had a very large file with this name Before that, an even longer gap. Patterns like this make me think they've got to have collaborators. Other robots with different UAs operating from different IPs (I checked both ways) who tell them what files to ask for. The alternative is that they're working through shopping lists from 2007. :: insert "noidea" emoticon here :: I can block 38.something, but not the whole aaa.
|
g1smd

msg:4498693 | 11:34 pm on Sep 22, 2012 (gmt 0) |
I wish I could be as motivated to look at logs in that much detail; they just get a cursory scan from time to time here. :)
|
keyplyr

msg:4498731 | 4:00 am on Sep 23, 2012 (gmt 0) |
| I realize a lot of people avoid the issue by locking out the whole 38.0.0.0/8 range |
| I was frustrated enough to block the entire range sometimes ago, then quickly realized that was a big mistake :) Cogent pretty much includes a huge part of North East US and Canada. Lots of server farms, but also municipal agencies, schools, small ISPs, private companies...
|
lucy24

msg:4498754 | 8:05 am on Sep 23, 2012 (gmt 0) |
| I wish I could be as motivated to look at logs in that much detail |
| The Regular Expression Is Your Friend :) Spotlight to bring up which log files might contain what I'm looking for; quick RegEx search within the relevant files to pull out the desired lines; a bit of cut & paste and global replaces to bring everything into focus. I think the single happiest discovery I ever made about SubEthaEdit was that the content of its Find All window can itself be selected and pasted. Invaluable when making links in HTML versions of EETS publications with Glossarial Index at the end. Other uses revealed themselves later. Helps when you're so small that you can process your logs in javascript. Throw in some color coding, and any unexpected slabs of robotic green are bound to catch my attention sooner or later. It's been a quiet year overall; haven't met anything truly outrageous since February. But, ahem, the line | three requests for different directory, three of them {etc} |
| would probably have made more sence if my fingers had typed "two of them" as my brain clearly told them to do. | but also municipal agencies, schools, small ISPs, private companies |
| Yes, it's the Canadian schools that I keep getting. * www variant of the old joke formula, "Yo momma so fat, she..." {etc.} "Your web site's so small, it..."
|
blend27

msg:4542355 | 9:54 pm on Feb 4, 2013 (gmt 0) |
I new this will byte me in the @$$ sooner or later, I am blocking it since 03/2007. The whole /8 range. Today had a phone interview with the guy, after I finished, he asked me to send some URLs(of the work I've done in the past), so he could forward them to the Boss & Team that will be doing second round interview. And BAM, the Boss is on PSInet/Cogent Range, got 403'd with the nice message displayed: Sorry You are not on the list! on 3 URLs that I sent. Got a call from the guy asking if this was a practical joke. :)
|
wilderness

msg:4542358 | 10:03 pm on Feb 4, 2013 (gmt 0) |
| And BAM, the Boss is on PSInet/Cogent Range, got 403'd with the nice message displayed: Sorry You are not on the list! on 3 URLs that I sent. Got a call from the guy asking if this was a practical joke. |
| What's the big deal. Modify the range to allow him access and explain it was server configuration error.
|
blend27

msg:4542359 | 10:10 pm on Feb 4, 2013 (gmt 0) |
Did that on the spot, he was very impressed, everything is peachy. BTW, here is a thread from 2011 with some ranges/htaccess by caribguy: [webmasterworld.com...]
|
wilderness

msg:4542369 | 10:48 pm on Feb 4, 2013 (gmt 0) |
Don't recall the range, however Jim was emphatic about leaving a portion of this open for a specific bot.
|
|