homepage Welcome to WebmasterWorld Guest from 54.196.62.132
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
intelium bot
Ken_S




msg:4513744
 12:14 pm on Oct 30, 2012 (gmt 0)

reported content scraper -- 67.217.35.0 - 67.217.35.255

67.217.35.xx - - [29/Oct/2012:22:53:08 -0700] "GET /robots.txt HTTP/1.1" 200 403 "http://www.example.com/robots.txt" "intelium_bot"
67.217.35.xx - - [29/Oct/2012:22:53:08 -0700] "GET / HTTP/1.1" 200 1985 "http://www.example.com" "intelium_bot"

Took about half of my html files but no images in a second. New bot and IP range for me.
Ken

 

lucy24




msg:4513984
 12:11 am on Oct 31, 2012 (gmt 0)

:: detour to raw logs ::

Did yours use auto-referers throughout, as in your examples? Mine did. Same IP you met, couple of weeks back. I've got the group as 67.217.32.0/20.

They only asked for the top level of directory files-- that is
www.example.com/
and then
www.example.com/directoryname/
--but they must be talking to some other robot, because there was one interesting 404. (Only meaningful if you know the exact site configuration and filenames.) This detail always worries me.

I also turned up an apparent human from 67.217.32.nnn a couple of weeks before that one. By "apparent" I mean that they scooped up a single page with all associated files in a reasonable time period. In retrospect they were slower than you would expect for a human: ten tiny images totaling 66K shouldn't span two seconds. I don't think a lot of people run robots on satellite internet. (Caching from sites in northern Canada doesn't count.)

:: further detour here to check whether any legitimate UA contains no spaces at all (answer: yes) ::

wilderness




msg:4514007
 1:28 am on Oct 31, 2012 (gmt 0)

I'd rather be lucky than good any day ;)

I've something that caught this bot, unfortunately I'm unable to determine WHAT.

It's not the IP or the UA, must be one of the UA-Browser rules a friend provided long ago.

dstiles




msg:4514391
 9:50 pm on Oct 31, 2012 (gmt 0)

I have the range 67.217.32.0 - 67.217.47.255 blocked as Netsource, USA.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved