Forum Moderators: goodroi
Jazzvn
A robots.txt file is simply a text file created in a text editor that is stored on the server. Here is a simple robots.txt:
User-agent: *
Disallow: /cgi-bin/
Disallow: /members/
Here is a site that details what the file is and what it does [robotstxt.org ]
Hope this helps to get you started.
Welcome to WebmasterWorld [webmasterworld.com]!
Here's the Robots.txt standard [robotstxt.org].
Putting the robots.txt file in your root directory will not help to get your site indexed. The purpose of robots.txt is to tell "good" robots not to index certain pages or subdirectories of your site. Bad robots, such as e-mail address harvesters, will ignore robots.txt.
Typical uses are to keep robots from requesting your scripts and/or shopping cart, to stop them from indexing or copying your images, and to keep them from listing your "semi-private" pages - although this is no guarantee that those pages will be kept private!
Another use is to keep robots from requesting pages that you don't need indexed and consuming large amounts of bandwidth.
In order to get your pages indexed in search engines, what you need to do is to get incoming links -- get other sites which are already indexed to link to your pages. Best results will be had if the sites that link to your pages share the topic of your pages. Also, links from reputable directories such as the Open Directory Project (DMOZ) are good to have.
In addition to robots.txt, which is a text file placed in the root directory of your site, you can also use the on-page HTML robots meta-tag of the form:
<meta name="robots" content="noindex,nofollow"> Note that if a page is disallowed in robots.txt, then the page won't be read by robots, and the above tag will have no effect.
Also note that Ask Jeeves and Google will index your page (list it in search results) if they find any link to your page, regardless of whether that page is disallowed in robots.txt. The only way to stop them from listing your page is to allow them to fetch it (don't disallow tha page in robots.txt) and use the on-page HTML robots meta-tag shown above.
Use a simple text editor such as NotePad to create your robots.txt. Once you have written your robots.txt file, validate it here [searchengineworld.com].
Jim
It say no roindex and nofollow.
I really want to have these:
User-agent:*
Disallow:/album/
# Album is my whole folder of family webalbum, and I don't want people see it. Is that code right? This will let any kind of spider but not into my www.domain.com/album, right?
Thanks
I don't want people see it.
If you use robots.txt, theres no need to use META tags. If you use META tags, theres no need to use robots.txt.
So I'd suggest go with the robots.txt code you posted in your last message.
Good luck!
Sid
PS; welcome to WebmasterWorld!
Jim
A little help please
In your opinion would this also apply to Yahoo! Slurp?
However, Google does this. If you do a query for "Overture", in result #3, even though content.overture.com [overture.de] has banned all robots via robots.txt, Google still has its URL.
So theres a chance of this happening in Yahoo! results, as Yahoo! still seems to use some of Google's.
Sid
Thank you for the feedback and info
I asked the question because Yahoo! recently listed a whole site of ours that does contain a robots.txt forbidding indexing.
The strange thing is it lists all pages but with no descriptions.
Just the url and and company name which contains the link.
Been trying to find out why without success so far.
Any ideas?
Ray