| fast & robots.txt did i get it wrong... |
msampson

msg:219637 | 4:47 pm on Dec 1, 2002 (gmt 0) | Hi FAST-WebCrawler has visited me twice, but only retrieved one file each time (which I assume was robots.txt). Checking the correct syntax, I realise I made a small error and I wonder if this is the reason it hasn't done a full crawl. (google and altavista have crawled OK). I had | User-agent: * Disallow: /~blah |
| but i should have had | User-agent: * Disallow: /~blah/ #don't forget the final slash |
| Did fast interpret that as Disallow: / (ie disallow the whole site? Is my interpretation correct? Should I let fast know that I made a mistake or will they come back and look without me notifying them? Thanks Miles
|
jdMorgan

msg:219638 | 4:58 pm on Dec 1, 2002 (gmt 0) | msampson, The only result of your typo is that files in your top-level directory that start with "~blah" (e.g. "/~blah2.html" would be disallowed, as well as the subdirectory "/~blah/" that you intended to disallow. If no such files exist, then the typo will have no practical effect. Check out the robots.txt validator page [searchengineworld.com] and the info linked on that page for more info on why you might have a problem. If you don't find any problems, it may just be that fast has found a link to your site while working on another site, and came over to check robots.txt to see if it would be allowed to spider your site later. HTH, Jim
|
CuriousWeb

msg:219639 | 7:29 pm on Dec 1, 2002 (gmt 0) | Hi msampson I also had some problems with Fast crawling my site. They visited regularly over a period of months but never requested anything other than robots.txt. I would recommend you contact them as I did as they are very helpful and my site is now being deep crawled and indexed...
|
msampson

msg:219640 | 10:22 am on Dec 2, 2002 (gmt 0) | thanks for very much for both your answers. i'm always very impressed by the level of knowledge and helpfulness on these boards.
|
|
|