Forum Moderators: martinibuster

Message Too Old, No Replies

Test versions of page and google ad sense

         

James in Vancouver

7:36 pm on Nov 21, 2003 (gmt 0)

10+ Year Member




I was wondering if Google will automatically try and spider any page that has an ad sense ad on it?

The reason is that I have a test version of our website in a subdirectory that is just hidden by obscurity. So far it has never been searched by google or any other search engine and I would rather keep it that way as I don't want users to ever get pointed to the test version of the page.

We are considering adding google ad words to our page, but we want to test it out for a while first on our test version of the pages to see what kind of ads get served and to try and fit in on our page in a acceptable manner, but we don't want to expose our test pages to google...

I know I could add a META NOINDEX tag, but that is one more thing I have to change everytime I move a test page to the live version, so I would rather not, if it is not necessary.

Thanks,
James

linear

7:43 pm on Nov 21, 2003 (gmt 0)

10+ Year Member



The various googlebots all respect robots.txt, that's the proper way to keep a directory out of the index. (Arguably, it's not perfect, because lots of bots don't respect it, so the only effective way is with real access control--obscurity is never totally adequate).

You want to make sure that Googlebot/2.1 (+http://www.googlebot.com/bot.html) can't get in but Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html) can in your case.

James in Vancouver

11:07 pm on Nov 21, 2003 (gmt 0)

10+ Year Member



Thanks that is probably the best way of doing it.

However, I've been reluctant to use robots.txt to date as I've heard that some spambots use the robots.txt to find out about directories they would not otherwise go to. In reality though I probably shouldn't be bothered by that as the test version of the site is intended to be public at some time. (It is usually more experimenting with formatting and structure than actual content)

I've done a little looking and I'm not clear on the robots.txt entry. It seems there is not an easy way to disallow all robots but one...

Also do most validators obey robots.txt? If so I would have to allow them...

I suppose that as long as no search engine robots follow links in the robots.txt file probably by disallowing just googlebot I should be in at least the same position I am in now.

I know that access control would be better from a security perspective. However, as I mentioned security isn't really my issue. My issue simply is that I don't want search engines to end up going to the test version of the site. If I did access control I would have a harder time with testing different software, different hardware etc...

Thanks
James

linear

1:11 pm on Nov 22, 2003 (gmt 0)

10+ Year Member



James_in_Vancouver
I've done a little looking and I'm not clear on the robots.txt entry. It seems there is not an easy way to disallow all robots but one...

Sure there is. That's off-topic for this forum, but there's a whole forum here at WWF dedicated to robots.txt. [webmasterworld.com...]

mmarlor

12:47 am on Nov 30, 2003 (gmt 0)

10+ Year Member



In my case, I've got quite a bit of control over it. My development server and the live server are two very different servers. Yes, mediabot used to visit it, but I decided that on the whole it would be better if no robots visited the development server. So disallow all.. :-)

It's certainly the most viable way for me since I often make large site-wide code changes which could otherwise adversely impact upon the live site if implemented untested.

In theory if visitors wanted an Adsense-free site they could visit my development server, but since the link speed is much slower and the content is only updated when I remember to do an update from the live database .... :-)