Forum Moderators: martinibuster
The reason is that I have a test version of our website in a subdirectory that is just hidden by obscurity. So far it has never been searched by google or any other search engine and I would rather keep it that way as I don't want users to ever get pointed to the test version of the page.
We are considering adding google ad words to our page, but we want to test it out for a while first on our test version of the pages to see what kind of ads get served and to try and fit in on our page in a acceptable manner, but we don't want to expose our test pages to google...
I know I could add a META NOINDEX tag, but that is one more thing I have to change everytime I move a test page to the live version, so I would rather not, if it is not necessary.
Thanks,
James
You want to make sure that Googlebot/2.1 (+http://www.googlebot.com/bot.html) can't get in but Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html) can in your case.
However, I've been reluctant to use robots.txt to date as I've heard that some spambots use the robots.txt to find out about directories they would not otherwise go to. In reality though I probably shouldn't be bothered by that as the test version of the site is intended to be public at some time. (It is usually more experimenting with formatting and structure than actual content)
I've done a little looking and I'm not clear on the robots.txt entry. It seems there is not an easy way to disallow all robots but one...
Also do most validators obey robots.txt? If so I would have to allow them...
I suppose that as long as no search engine robots follow links in the robots.txt file probably by disallowing just googlebot I should be in at least the same position I am in now.
I know that access control would be better from a security perspective. However, as I mentioned security isn't really my issue. My issue simply is that I don't want search engines to end up going to the test version of the site. If I did access control I would have a harder time with testing different software, different hardware etc...
Thanks
James
I've done a little looking and I'm not clear on the robots.txt entry. It seems there is not an easy way to disallow all robots but one...
Sure there is. That's off-topic for this forum, but there's a whole forum here at WWF dedicated to robots.txt. [webmasterworld.com...]
It's certainly the most viable way for me since I often make large site-wide code changes which could otherwise adversely impact upon the live site if implemented untested.
In theory if visitors wanted an Adsense-free site they could visit my development server, but since the link speed is much slower and the content is only updated when I remember to do an update from the live database .... :-)