Forum Moderators: open
69.84.207.yyy - - [07/Aug/2008:05:18:51 -0400] "GET / HTTP/1.1" 200 3020 "-" "Mozilla/4.0 (compatible; MSIE 7.0;Windows NT 5.1;.NET CLR 1.1.4322;.NET CLR 2.0.50727;.NET CLR 3.0.04506.30)"
69.84.207.yyy - - [07/Aug/2008:05:18:52 -0400] "GET /blackhole HTTP/1.1" 301 260 "-" "Mozilla/4.0 (compatible; MSIE 7.0;Windows NT 5.1;.NET CLR 1.1.4322;.NET CLR 2.0.50727;.NET CLR 3.0.04506.30)"
However, when you visit the IP it states it is a crawler and it is performing a very important function downloading your entire web site... without your permission, of course.
"Because of this job we have to download and evaulate the content of every website on the Internet that children can reach. To keep an accurate database, we download and evaluate each website several times a year. We try to download web content without overly burdening any given web server.
This is not a hacking site, or a denial of service attack, or anything of that sort."
No, of course it isn't. Just a rude walk-through of my web site, and then take whatever you can get your grubby hands on. (Webmasters love that kind of stuff.)
Bot-trapped, banned, and kicked to the curb.
2005:
66.17.15.yyy - - [28/Mar/2005:20:02:03 -0800] "GET /MyFolder/MyPage.html HTTP/1.1" 206 10097 "-" "Schmozilla/v9.14 Platinum"
2006:
66.17.15.zzz - - [06/Aug/2005:13:37:36 -0700] "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215)"
Bot-trapped, banned, and kicked to the curb
When Lightspeed first attracted my attention a couple of years ago I did the same as you and looked them up - and found that a satirical YouTube spoof already existed (and it is still there today).
After giving it some thought I reckoned that that they probably analyze sites automatically by searching for "trigger" keywords etc, and that no human check is likely to be involved.
So I don't block them by IP and serve them a low-bandwidth "robots policy" file instead.
I don't really know if this works, but I now do it for all known content filters.
...
This crawler fits that definition perfectly.