Welcome to WebmasterWorld Guest from 54.227.52.24

Forum Moderators: goodroi

Message Too Old, No Replies

CCGCrawl -- custom crawler/harvester software

New crawler software potential for abuse

     

privacyman

11:21 am on Mar 1, 2004 (gmt 0)

10+ Year Member



A recent entry was found in my log file

80.252.XXX.XX - - [26/Feb/2004:23:57:21 -0800] "GET /robots.txt HTTP/1.1" 403 - "-" "CCGCrawl/1.1-dev (CCGCrawl 1.1; example.com; robot@example.com)"

Blocked (403) because it falls within an existing portion of IP group already blocked.

Investigation of myworkbase.com revealed that it is custom crawler software. They do not specify whether it grabs images (often copyrighted) or whether it obeys robots.txt or meta tags. I left them feedback suggesting they provide info on those topics on their site and that they ensure it complies with those items, also enquired if it is speed-decent.

This is just a heads-up. If any versions of CCGCrawl are already out in public use they possibly may not be compliant, but whereas it could not access my site due to IP block in .htaccess I have no way to affirm. Possibly the software company will provide details, and may ensure compliance at least with later versions should this one not be so.

This crawler bot could be used by anyone for any purpose.
;-)

[edited by: engine at 12:41 pm (utc) on Mar. 1, 2004]
[edit reason] specifics removed [/edit]

webapache

1:38 pm on Mar 13, 2004 (gmt 0)

10+ Year Member



Well what is the website or home url of CCGCrawl?
 

Featured Threads

Hot Threads This Week

Hot Threads This Month