homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

CCGCrawl -- custom crawler/harvester software
New crawler software potential for abuse

10+ Year Member

Msg#: 306 posted 11:21 am on Mar 1, 2004 (gmt 0)

A recent entry was found in my log file

80.252.XXX.XX - - [26/Feb/2004:23:57:21 -0800] "GET /robots.txt HTTP/1.1" 403 - "-" "CCGCrawl/1.1-dev (CCGCrawl 1.1; example.com; robot@example.com)"

Blocked (403) because it falls within an existing portion of IP group already blocked.

Investigation of myworkbase.com revealed that it is custom crawler software. They do not specify whether it grabs images (often copyrighted) or whether it obeys robots.txt or meta tags. I left them feedback suggesting they provide info on those topics on their site and that they ensure it complies with those items, also enquired if it is speed-decent.

This is just a heads-up. If any versions of CCGCrawl are already out in public use they possibly may not be compliant, but whereas it could not access my site due to IP block in .htaccess I have no way to affirm. Possibly the software company will provide details, and may ensure compliance at least with later versions should this one not be so.

This crawler bot could be used by anyone for any purpose.

[edited by: engine at 12:41 pm (utc) on Mar. 1, 2004]
[edit reason] specifics removed [/edit]



10+ Year Member

Msg#: 306 posted 1:38 pm on Mar 13, 2004 (gmt 0)

Well what is the website or home url of CCGCrawl?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved