Welcome to WebmasterWorld Guest from 54.145.166.96

Forum Moderators: goodroi

Message Too Old, No Replies

A question on robots.txt files

~/ghealton/tmp/ or /tmp/ in /~ghealton/robots.txt

   
3:39 pm on Aug 30, 2002 (gmt 0)

10+ Year Member



I have been looking at many robots.txt documents, including [searchengineworld.com...]
However I still have an important question.

When using home directories under Apache servers in my [exit109.com...] directory,
if I want to exclude /~ghealton/tmp/ to I use /~ghealton/tmp/, or like
so many people seem to use, /tmp/ in the Disallow statement in ~ghealton/robots.txt? If it is /~ghealton/tmp/ then you have a nother common error to add to the list of common problems.

At this time I am using BOTH to assure all spiders will avoid my files.

4:05 pm on Aug 30, 2002 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Hello ghealton, welcome to Webmaster World.

Robots Exclusion Protocol [robotstxt.org]

Pointless robots.txt URLs
http //www w3.org/admin/robots.txt
http //www w3.org/~timbl/robots.txt
ftp //ftp w3.com/robots.txt

So, you need to provide the "/robots.txt" in the top-level of your URL space. How to do this depends on your particular server software and configuration.

For most servers it means creating a file in your top-level server directory. On a UNIX machine this might be...

/usr/local/etc/httpd/htdocs/robots.txt

4:05 pm on Aug 30, 2002 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



ghealton,

Welcome to WebmasterWorld!

Putting a robots.txt inside a directory that starts with "~" almost never works, since that is not the root directory. Most spiders will ignore it.

So you will need to use Disallow: /~ghealton/tmp in the root directory of your site.

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month