Forum Moderators: open

Message Too Old, No Replies

Unknown bot ...

Anyone know who is this?

         

ideavirus

7:53 am on Oct 21, 2003 (gmt 0)

10+ Year Member



Hello,

Since the 9th of this month one of my sites is being crawled by a bot which my awstats identifies like this :

Unknown robot (identified by 'spider') .. on 09 Oct 2003 - 17:35

Unknown robot (identified by 'robot') .. on 18 Oct 2003 - 16:26

Unknown robot (identified by 'crawl') .. on 20 Oct 2003 - 05:14

thanks for any help.

Cheers

bull

10:07 am on Oct 21, 2003 (gmt 0)

10+ Year Member



can you provide raw logs with IP?

dcrombie

4:53 pm on Oct 28, 2003 (gmt 0)



203.219.86.6 - - "GET / HTTP/1.1" "-" "Unknown (compatible; Unknown; Unknown)"
203.219.86.6 - - "GET /some/image_on.gif HTTP/1.1" "http://www.somesite.com/" "Unknown (compatible; Unknown; Unknown)"
203.219.86.6 - - "GET /some/image_off.gif HTTP/1.1" "http://www.somesite.com/" "Unknown (compatible; Unknown; Unknown)"

Total of 87 hits in under 2 minutes (mostly images). Took the front page of the site with images (including rollover images), then a few other pages. Two pages were requested twice. No request for robots.txt.

Looks like a browser to me.

BlueSky

10:53 pm on Oct 28, 2003 (gmt 0)

10+ Year Member



inetnum: 203.219.0.0 - 203.219.255.255
netname: TPGCOM-AU
descr: Australian ISP
descr: North Ryde, NSW, 2113
descr: Australia
country: AU

Report abuse: abuse@tpg.com.au

If it's a regular surfer you will see a hit in your logs for each graphic on the page. Did he pull any graphics outside the few pages accessed?

dcrombie

8:32 am on Oct 29, 2003 (gmt 0)



No, just those associated with the pages. Whatever it was had to process JavaScript to load the mouseover images, so if it's not a browser then it's some kind of intelligent agent.

balam

4:21 am on Nov 2, 2003 (gmt 0)

10+ Year Member



> Unknown robot (identified by 'crawl')

Since AWStats doesn't know about every bot that there is, the developer has taken a pretty neat route and analyzes the UA for some keywords - "spider", "robot" and "crawl" - and tries to identify (unknown) bots that way.

To use one (common) example, AWStats doesn't identify LookSmart's "grub" crawler by name, but because of the use of the word "crawl" in the UA, AWStats catches grub that way...

Mozilla/4.0 (compatible; grub-client-1.5.3; Crawl your own stuff with http*://grub.org)