robots.txt tip

I run a development Apache web server on my local machine. I have several websites, so I created virtual hosts for each one. I use DynDNS wildcard domain and have sites under different subdomains.

One day I noticed that Yahoo spider found the local (development) version and crawled some of the pages! The local website was listed in the index. That really bothered me. I was afraid Google would find it and unlist one of them for dup content. And I just generally don't want people poking around the development server.

I could have created a robots.txt on the local machine and ban all the bots, but that's risky! I could by mistake upload that robots.txt and effectivly ban all the bots from the live website.

So I came up with this solution to the problem. I created an alias for robots.txt file.

Alias /robots.txt /home/robots.txt

robots.txt file content:

User-agent: * 
Disallow: /

Now all requests to

/robots.txt

to any domain, subdomain or IP on my local machine will serve the "ban-all" robots.txt.

And here is the best part. You can still have you real robots.txt for the live site on the development machine, and the Alias command will simply override it.

robots.txt tip

moltar

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week