Welcome to WebmasterWorld Guest from 54.163.25.166

Forum Moderators: goodroi

Message Too Old, No Replies

How does robot.txt work?

     
4:01 pm on Feb 23, 2001 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 16, 2001
posts:585
votes: 0


In our head content we have among other content the followign for robots:
<meta name="Robots" content="index">
<meta name="Robots" content="follow">

Looking at the files in Net Tracker, the SE's robots come to robot.txt, but for a short time.
My question is, how do they index the site? What exactly are the spiders looking for in the robots.txt file?

Let me know if you need more clarification

Thanks,

Paul

4:07 pm on Feb 23, 2001 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 16, 2001
posts:585
votes: 0


Sorry forgot to add, this is what the
robots.txt file currently has:

User-Agent: *
Disallow: *_private*
Disallow: *_vti*

Thanks,
PG

4:26 pm on Feb 23, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 2, 2000
posts:113
votes: 0


I think you need to take a good hard look at this site.

[info.webcrawler.com...]

This the best information you are going to find on robots.txt.

2:59 am on Feb 28, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 26, 2000
posts:2176
votes: 0


"In our head content we have among other content the followign for robots:
<meta name="Robots" content="index">
<meta name="Robots" content="follow">"

These meta tags probably don't cause any harm, but they have absolutely no effect when it comes to how often your site gets spidered, or how many pages get crawled. Some engines do honor the noindex meta tag, but others will ignore it. All credible engines honor the robots.txt. That is why you will see so many requests for it in your logs. Before a spider begins to crawl, it will check that file to see if there are any pages it shouldn't index.