homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

intelium bot

 12:14 pm on Oct 30, 2012 (gmt 0)

reported content scraper -- -

67.217.35.xx - - [29/Oct/2012:22:53:08 -0700] "GET /robots.txt HTTP/1.1" 200 403 "http://www.example.com/robots.txt" "intelium_bot"
67.217.35.xx - - [29/Oct/2012:22:53:08 -0700] "GET / HTTP/1.1" 200 1985 "http://www.example.com" "intelium_bot"

Took about half of my html files but no images in a second. New bot and IP range for me.



 12:11 am on Oct 31, 2012 (gmt 0)

:: detour to raw logs ::

Did yours use auto-referers throughout, as in your examples? Mine did. Same IP you met, couple of weeks back. I've got the group as

They only asked for the top level of directory files-- that is
and then
--but they must be talking to some other robot, because there was one interesting 404. (Only meaningful if you know the exact site configuration and filenames.) This detail always worries me.

I also turned up an apparent human from 67.217.32.nnn a couple of weeks before that one. By "apparent" I mean that they scooped up a single page with all associated files in a reasonable time period. In retrospect they were slower than you would expect for a human: ten tiny images totaling 66K shouldn't span two seconds. I don't think a lot of people run robots on satellite internet. (Caching from sites in northern Canada doesn't count.)

:: further detour here to check whether any legitimate UA contains no spaces at all (answer: yes) ::


 1:28 am on Oct 31, 2012 (gmt 0)

I'd rather be lucky than good any day ;)

I've something that caught this bot, unfortunately I'm unable to determine WHAT.

It's not the IP or the UA, must be one of the UA-Browser rules a friend provided long ago.


 9:50 pm on Oct 31, 2012 (gmt 0)

I have the range - blocked as Netsource, USA.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved