| How does robot.txt work?
|
Acternaweb

msg:1528277 | 4:01 pm on Feb 23, 2001 (gmt 0) | In our head content we have among other content the followign for robots: <meta name="Robots" content="index"> <meta name="Robots" content="follow"> Looking at the files in Net Tracker, the SE's robots come to robot.txt, but for a short time. My question is, how do they index the site? What exactly are the spiders looking for in the robots.txt file? Let me know if you need more clarification Thanks, Paul
|
Acternaweb

msg:1528278 | 4:07 pm on Feb 23, 2001 (gmt 0) | Sorry forgot to add, this is what the robots.txt file currently has: User-Agent: * Disallow: *_private* Disallow: *_vti* Thanks, PG
|
Hope

msg:1528279 | 4:26 pm on Feb 23, 2001 (gmt 0) | I think you need to take a good hard look at this site. [info.webcrawler.com...] This the best information you are going to find on robots.txt.
|
WebGuerrilla

msg:1528280 | 2:59 am on Feb 28, 2001 (gmt 0) | "In our head content we have among other content the followign for robots: <meta name="Robots" content="index"> <meta name="Robots" content="follow">" These meta tags probably don't cause any harm, but they have absolutely no effect when it comes to how often your site gets spidered, or how many pages get crawled. Some engines do honor the noindex meta tag, but others will ignore it. All credible engines honor the robots.txt. That is why you will see so many requests for it in your logs. Before a spider begins to crawl, it will check that file to see if there are any pages it shouldn't index.
|
|
|