homepage Welcome to WebmasterWorld Guest from 54.227.160.102
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Lightspeed Systems
sneaky creepy greedy crawler
idiotgirl

10+ Year Member



 
Msg#: 3717466 posted 11:20 am on Aug 7, 2008 (gmt 0)

Came by for a visit tonight and immediately got its feet stuck in my bot trap. Does not check for robots.txt, nor does it call itself a bot or crawler.

69.84.207.yyy - - [07/Aug/2008:05:18:51 -0400] "GET / HTTP/1.1" 200 3020 "-" "Mozilla/4.0 (compatible; MSIE 7.0;Windows NT 5.1;.NET CLR 1.1.4322;.NET CLR 2.0.50727;.NET CLR 3.0.04506.30)"
69.84.207.yyy - - [07/Aug/2008:05:18:52 -0400] "GET /blackhole HTTP/1.1" 301 260 "-" "Mozilla/4.0 (compatible; MSIE 7.0;Windows NT 5.1;.NET CLR 1.1.4322;.NET CLR 2.0.50727;.NET CLR 3.0.04506.30)"

However, when you visit the IP it states it is a crawler and it is performing a very important function downloading your entire web site... without your permission, of course.

"Because of this job we have to download and evaulate the content of every website on the Internet that children can reach. To keep an accurate database, we download and evaluate each website several times a year. We try to download web content without overly burdening any given web server.

This is not a hacking site, or a denial of service attack, or anything of that sort."

No, of course it isn't. Just a rude walk-through of my web site, and then take whatever you can get your grubby hands on. (Webmasters love that kind of stuff.)

Bot-trapped, banned, and kicked to the curb.

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3717466 posted 4:52 pm on Aug 7, 2008 (gmt 0)

They've been around a while.

2005:
66.17.15.yyy - - [28/Mar/2005:20:02:03 -0800] "GET /MyFolder/MyPage.html HTTP/1.1" 206 10097 "-" "Schmozilla/v9.14 Platinum"

2006:

66.17.15.zzz - - [06/Aug/2005:13:37:36 -0700] "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50215)"

Samizdata

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3717466 posted 12:06 am on Aug 10, 2008 (gmt 0)

Bot-trapped, banned, and kicked to the curb

When Lightspeed first attracted my attention a couple of years ago I did the same as you and looked them up - and found that a satirical YouTube spoof already existed (and it is still there today).

After giving it some thought I reckoned that that they probably analyze sites automatically by searching for "trigger" keywords etc, and that no human check is likely to be involved.

So I don't block them by IP and serve them a low-bandwidth "robots policy" file instead.

I don't really know if this works, but I now do it for all known content filters.

...

idiotgirl

10+ Year Member



 
Msg#: 3717466 posted 9:31 am on Aug 11, 2008 (gmt 0)

When any crawler, particularly one that is charging people for the privilege of looking at my widgets, comes tromping in like a bull in a China shop posing as a visitor and not a bot, and immediately gets nailed in a bot trap, I'm going to send them packing, no matter what noble cause they proclaim they're serving.

This crawler fits that definition perfectly.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved