Forum Moderators: phranque

Message Too Old, No Replies

robots.txt tip

         

moltar

4:21 am on May 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I run a development Apache web server on my local machine. I have several websites, so I created virtual hosts for each one. I use DynDNS wildcard domain and have sites under different subdomains.

One day I noticed that Yahoo spider found the local (development) version and crawled some of the pages! The local website was listed in the index. That really bothered me. I was afraid Google would find it and unlist one of them for dup content. And I just generally don't want people poking around the development server.

I could have created a robots.txt on the local machine and ban all the bots, but that's risky! I could by mistake upload that robots.txt and effectivly ban all the bots from the live website.

So I came up with this solution to the problem. I created an alias for robots.txt file.

Alias /robots.txt /home/robots.txt

robots.txt file content:

User-agent: * 
Disallow: /

Now all requests to

/robots.txt
to any domain, subdomain or IP on my local machine will serve the "ban-all" robots.txt.

And here is the best part. You can still have you real robots.txt for the live site on the development machine, and the Alias command will simply override it.