Welcome to WebmasterWorld Guest from 54.211.17.91

Forum Moderators: goodroi

Message Too Old, No Replies

Need help making robots to block all spiders

   
5:46 pm on Feb 16, 2004 (gmt 0)

10+ Year Member



I created a copy of my site so that I can make some major changes without causing any downtime. As soon as I get done with it I will replace the current site with this one. But, that might not be for a few weeks. Is there a chance that this 'copy' of my site will be crawled? Ifso, I don't want that to happen...not until it is ready to replace the current site.

If this is a possible scenerio, what should I write in robots.txt to make it so this won't end up happening?

5:48 pm on Feb 16, 2004 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I'd put the new site into a sub-directory called /dev/ and then block that.

User-agent: *
Disallow: /dev/

5:51 pm on Feb 16, 2004 (gmt 0)

10+ Year Member



User-agent: *
Disallow: /

Should stop spiders that pay attention to the robots.txt file.

<added>must learn to type quicker here...

5:54 pm on Feb 16, 2004 (gmt 0)

10+ Year Member



ok, thanks guys...btw one other thing:

Will this leave a bad note with the crawler bot and cause it to take longer to return and crawl the site again when it is live and the robots.txt is changed?

5:59 pm on Feb 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As long as you use that subdirectory, whatever you name it, only to house the new version of your site, it doesn't matter.

Yes, spiders can definitely get into "invisible" subdirectories - I accidentally had a subdirectory called "new" that was a redesign of a large site completely spidered by Google because the google tool bar "told on me".

6:07 pm on Feb 16, 2004 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



User-agent: *
Disallow: /
Should stop spiders that pay attention to the robots.txt file.

I would not recommend the above. Once Googlebot gets a Disallow: / on an entire site, I think it may be a while before you can get a regular crawl. I would definitely create a new sub-directory and Disallow that from the spiders. This way it has no effect on the existing site or spidering of the new site once it goes live.

P.S. mcavill's answer was correct based on your question which was how to prevent all spiders from indexing your site.

In this case though, we only want to prevent them from indexing the new site.

6:18 pm on Feb 16, 2004 (gmt 0)

10+ Year Member



I created a sub-domain of the site...ex: xyz.mysite.com So if I put the disallow robots on that, will the effect www.mysite.com?
6:24 pm on Feb 16, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, it will not affect your main site.
7:47 pm on Feb 16, 2004 (gmt 0)

10+ Year Member



OK, thanks!