Welcome to WebmasterWorld Guest from 54.234.38.8

Forum Moderators: goodroi

Message Too Old, No Replies

Need help making robots to block all spiders

     
5:46 pm on Feb 16, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Dec 1, 2003
posts:284
votes: 0


I created a copy of my site so that I can make some major changes without causing any downtime. As soon as I get done with it I will replace the current site with this one. But, that might not be for a few weeks. Is there a chance that this 'copy' of my site will be crawled? Ifso, I don't want that to happen...not until it is ready to replace the current site.

If this is a possible scenerio, what should I write in robots.txt to make it so this won't end up happening?

5:48 pm on Feb 16, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts:12166
votes: 51


I'd put the new site into a sub-directory called /dev/ and then block that.

User-agent: *
Disallow: /dev/

5:51 pm on Feb 16, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:July 9, 2003
posts:405
votes: 0


User-agent: *
Disallow: /

Should stop spiders that pay attention to the robots.txt file.

<added>must learn to type quicker here...

5:54 pm on Feb 16, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Dec 1, 2003
posts:284
votes: 0


ok, thanks guys...btw one other thing:

Will this leave a bad note with the crawler bot and cause it to take longer to return and crawl the site again when it is live and the robots.txt is changed?

5:59 pm on Feb 16, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 6, 2003
posts:2523
votes: 0


As long as you use that subdirectory, whatever you name it, only to house the new version of your site, it doesn't matter.

Yes, spiders can definitely get into "invisible" subdirectories - I accidentally had a subdirectory called "new" that was a redesign of a large site completely spidered by Google because the google tool bar "told on me".

6:07 pm on Feb 16, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts:12166
votes: 51


User-agent: *
Disallow: /
Should stop spiders that pay attention to the robots.txt file.

I would not recommend the above. Once Googlebot gets a Disallow: / on an entire site, I think it may be a while before you can get a regular crawl. I would definitely create a new sub-directory and Disallow that from the spiders. This way it has no effect on the existing site or spidering of the new site once it goes live.

P.S. mcavill's answer was correct based on your question which was how to prevent all spiders from indexing your site.

In this case though, we only want to prevent them from indexing the new site.

6:18 pm on Feb 16, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Dec 1, 2003
posts:284
votes: 0


I created a sub-domain of the site...ex: xyz.mysite.com So if I put the disallow robots on that, will the effect www.mysite.com?
6:24 pm on Feb 16, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 18, 2003
posts:1925
votes: 0


No, it will not affect your main site.
7:47 pm on Feb 16, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Dec 1, 2003
posts:284
votes: 0


OK, thanks!