Welcome to WebmasterWorld Guest from 54.167.46.29

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

intelium bot

     
12:14 pm on Oct 30, 2012 (gmt 0)

New User

joined:Oct 9, 2012
posts:34
votes: 0


reported content scraper -- 67.217.35.0 - 67.217.35.255

67.217.35.xx - - [29/Oct/2012:22:53:08 -0700] "GET /robots.txt HTTP/1.1" 200 403 "http://www.example.com/robots.txt" "intelium_bot"
67.217.35.xx - - [29/Oct/2012:22:53:08 -0700] "GET / HTTP/1.1" 200 1985 "http://www.example.com" "intelium_bot"

Took about half of my html files but no images in a second. New bot and IP range for me.
Ken
12:11 am on Oct 31, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12716
votes: 244


:: detour to raw logs ::

Did yours use auto-referers throughout, as in your examples? Mine did. Same IP you met, couple of weeks back. I've got the group as 67.217.32.0/20.

They only asked for the top level of directory files-- that is
www.example.com/
and then
www.example.com/directoryname/
--but they must be talking to some other robot, because there was one interesting 404. (Only meaningful if you know the exact site configuration and filenames.) This detail always worries me.

I also turned up an apparent human from 67.217.32.nnn a couple of weeks before that one. By "apparent" I mean that they scooped up a single page with all associated files in a reasonable time period. In retrospect they were slower than you would expect for a human: ten tiny images totaling 66K shouldn't span two seconds. I don't think a lot of people run robots on satellite internet. (Caching from sites in northern Canada doesn't count.)

:: further detour here to check whether any legitimate UA contains no spaces at all (answer: yes) ::
1:28 am on Oct 31, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


I'd rather be lucky than good any day ;)

I've something that caught this bot, unfortunately I'm unable to determine WHAT.

It's not the IP or the UA, must be one of the UA-Browser rules a friend provided long ago.
9:50 pm on Oct 31, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3092
votes: 2


I have the range 67.217.32.0 - 67.217.47.255 blocked as Netsource, USA.