homepage Welcome to WebmasterWorld Guest from 54.161.246.212
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
intelium bot
Ken_S



 
Msg#: 4513742 posted 12:14 pm on Oct 30, 2012 (gmt 0)

reported content scraper -- 67.217.35.0 - 67.217.35.255

67.217.35.xx - - [29/Oct/2012:22:53:08 -0700] "GET /robots.txt HTTP/1.1" 200 403 "http://www.example.com/robots.txt" "intelium_bot"
67.217.35.xx - - [29/Oct/2012:22:53:08 -0700] "GET / HTTP/1.1" 200 1985 "http://www.example.com" "intelium_bot"

Took about half of my html files but no images in a second. New bot and IP range for me.
Ken

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4513742 posted 12:11 am on Oct 31, 2012 (gmt 0)

:: detour to raw logs ::

Did yours use auto-referers throughout, as in your examples? Mine did. Same IP you met, couple of weeks back. I've got the group as 67.217.32.0/20.

They only asked for the top level of directory files-- that is
www.example.com/
and then
www.example.com/directoryname/
--but they must be talking to some other robot, because there was one interesting 404. (Only meaningful if you know the exact site configuration and filenames.) This detail always worries me.

I also turned up an apparent human from 67.217.32.nnn a couple of weeks before that one. By "apparent" I mean that they scooped up a single page with all associated files in a reasonable time period. In retrospect they were slower than you would expect for a human: ten tiny images totaling 66K shouldn't span two seconds. I don't think a lot of people run robots on satellite internet. (Caching from sites in northern Canada doesn't count.)

:: further detour here to check whether any legitimate UA contains no spaces at all (answer: yes) ::

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4513742 posted 1:28 am on Oct 31, 2012 (gmt 0)

I'd rather be lucky than good any day ;)

I've something that caught this bot, unfortunately I'm unable to determine WHAT.

It's not the IP or the UA, must be one of the UA-Browser rules a friend provided long ago.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4513742 posted 9:50 pm on Oct 31, 2012 (gmt 0)

I have the range 67.217.32.0 - 67.217.47.255 blocked as Netsource, USA.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved