Dreamquick

msg:1527614 | 3:54 pm on Apr 2, 2003 (gmt 0) |
You can't use the an IP address in robots.txt - you need to know the name the robot will look for when it parses robots.txt, also bear in mind that this file is optional rather than server-enforced so there's nothing there that will stop a bad crawler from crawling what they want - irregardless of whether they read that file or not. Apart from that it looks okay but you might want to throw it through the Robots.txt Validator [searchengineworld.com] just to make sure. - Tony
|
Alternative Future

msg:1527615 | 3:57 pm on Apr 2, 2003 (gmt 0) |
Thanks for the quick response Dreamquick, I did have the name of the robot in there no harm is saying its name (LinkChecker) but it just doesnt obbey my rules! Last month it hogged nearly 1Gb of bandwidth! Any ideas on how to only allow it access to my links dir? [added]Also the validator said all was fine, hence the reason i wanted to validate it from another human[/added] Thx again, -gs
|
Dreamquick

msg:1527616 | 4:16 pm on Apr 2, 2003 (gmt 0) |
If you are on apache .htaccess will let you block the crawler by either UA or IP/IP range, blocking stuff on IIS is less easy but still possible. Sorry if that sounds a little vague but I'm an IIS + ASP person so Apache isn't my forte! If you'd like to know how to block at an ASP level that I do know :) sticky me if you'd like to chat about that... In either case a site search for blocking and your server type will produce a list of possible answers. <added> LinkChecker seems to be a link checker program (duh @ me), the source is available via sourceforge - it claims to be robots.txt compliant so you might just have someone faking that UA, or perhaps an older version. </added> - Tony
|
Alternative Future

msg:1527617 | 9:52 am on Apr 3, 2003 (gmt 0) |
Thanks Dreamquick, Your pointers helped me out no end! My robots.txt now looks like: User-agent: linksmanager Disallow: /blah Disallow: /blah Disallow: /blah Once again thanks for your help :) KR, -gs
|
|