homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
64.37.103.2 (bundy.infomaxinc.com)
WarmGlow




msg:399619
 5:11 pm on Aug 24, 2003 (gmt 0)

The robot from bundy.infomaxinc.com accessed two of my domains. It did not request robots.txt and was denied further access by requesting my ban script which is linked from "hidden" anchors.

REMOTE_HOST: 64.37.103.2 (bundy.infomaxinc.com)

HTTP_USER_AGENT: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5b) Gecko/20030722 Mozilla Firebird/0.6Connection: close

64.37.103.2 - - [22/Aug/2003:20:24:06 -0400] "GET [example.com...] HTTP/1.0" 200 8506 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5b) Gecko/20030722 Mozilla Firebird/0.6Connection: close"

The REMOTE_HOST and HTTP_USER_AGENT are now permanently denied access by .htaccess directives.

 

sidyadav




msg:399620
 4:51 am on Aug 25, 2003 (gmt 0)

Are you sure this is a Robot, Because it can be a user from infomaxinc.com. And the USER-AGENT string is a Linux , Mozilla Firebird. Or it could be that this Robot is Faking its User-Agent.

- Sid

WarmGlow




msg:399621
 7:04 pm on Aug 25, 2003 (gmt 0)

sidyadav wrote:
Are you sure this is a Robot...

I am sure beyond a reasonable doubt.

  1. "Connection: close" is appended to the HTTP_USER_AGENT string. The string should end immediately following the Mozilla Firebird version number.
  2. The request for the domain root Index is logged as "GET [example.com...] HTTP/1.0". Legitimate requests from Mozilla Firebird for the domain root Index are logged as "GET / HTTP/1.1" or "GET /index.html HTTP/1.1".
  3. Inline image files were not requested.
  4. My external style sheet and external JavaScript file which are linked by REL attributes were not requested.
  5. Documents linked in "hidden" anchors were requested.

The remote user is accused of mining content from my web site and I believe that the evidence presented above is sufficient to bring in a guilty verdict.

Please note that I am not denying access for requests from Mozilla Firebird. I am denying access for request from the HTTP_USER_AGENT exactly as recorded in my log file.

sidyadav




msg:399622
 5:06 am on Aug 26, 2003 (gmt 0)

mmm... Not sure...

jazzguy




msg:399623
 7:27 pm on Aug 26, 2003 (gmt 0)

I can confirm abusive robot behavior from bundy.infomaxinc.com. Their bot ripped through my site in May disregarding robots.txt and grabbing disallowed files. It only fetched HTML files -- no images, javascript or css. There is no way that it was a human visitor. The page requests came too fast and both the referrer and UA were blank.

Also, another bot came in June from berg.dbsmarketing.net which resolved to 64.37.103.34 and is in the same IP block (listed in SPEWS and assigned to cybercon.com). It exhibited the same behavior.

I posted about both of them back in June in dbsmarketing.net & infomaxinc.com [webmasterworld.com]. That post included some research that showed the IPAs were listed in SPEWS. The post is still there but for some reason it does not turn up in a site search for "infomaxinc.com."

The visits by these two bots was what finally got me motivated to install an automated bot trap.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved