homepage Welcome to WebmasterWorld Guest from 54.198.224.121
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
CCGCrawl -- custom crawler/harvester software
New crawler software potential for abuse
privacyman




msg:1527587
 11:21 am on Mar 1, 2004 (gmt 0)

A recent entry was found in my log file

80.252.XXX.XX - - [26/Feb/2004:23:57:21 -0800] "GET /robots.txt HTTP/1.1" 403 - "-" "CCGCrawl/1.1-dev (CCGCrawl 1.1; example.com; robot@example.com)"

Blocked (403) because it falls within an existing portion of IP group already blocked.

Investigation of myworkbase.com revealed that it is custom crawler software. They do not specify whether it grabs images (often copyrighted) or whether it obeys robots.txt or meta tags. I left them feedback suggesting they provide info on those topics on their site and that they ensure it complies with those items, also enquired if it is speed-decent.

This is just a heads-up. If any versions of CCGCrawl are already out in public use they possibly may not be compliant, but whereas it could not access my site due to IP block in .htaccess I have no way to affirm. Possibly the software company will provide details, and may ensure compliance at least with later versions should this one not be so.

This crawler bot could be used by anyone for any purpose.
;-)

[edited by: engine at 12:41 pm (utc) on Mar. 1, 2004]
[edit reason] specifics removed [/edit]

 

webapache




msg:1527588
 1:38 pm on Mar 13, 2004 (gmt 0)

Well what is the website or home url of CCGCrawl?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved