homepage Welcome to WebmasterWorld Guest from 54.242.18.190
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Need help making robots to block all spiders
budbiss




msg:1528831
 5:46 pm on Feb 16, 2004 (gmt 0)

I created a copy of my site so that I can make some major changes without causing any downtime. As soon as I get done with it I will replace the current site with this one. But, that might not be for a few weeks. Is there a chance that this 'copy' of my site will be crawled? Ifso, I don't want that to happen...not until it is ready to replace the current site.

If this is a possible scenerio, what should I write in robots.txt to make it so this won't end up happening?

 

pageoneresults




msg:1528832
 5:48 pm on Feb 16, 2004 (gmt 0)

I'd put the new site into a sub-directory called /dev/ and then block that.

User-agent: *
Disallow: /dev/

mcavill




msg:1528833
 5:51 pm on Feb 16, 2004 (gmt 0)

User-agent: *
Disallow: /

Should stop spiders that pay attention to the robots.txt file.

<added>must learn to type quicker here...

budbiss




msg:1528834
 5:54 pm on Feb 16, 2004 (gmt 0)

ok, thanks guys...btw one other thing:

Will this leave a bad note with the crawler bot and cause it to take longer to return and crawl the site again when it is live and the robots.txt is changed?

PatrickDeese




msg:1528835
 5:59 pm on Feb 16, 2004 (gmt 0)

As long as you use that subdirectory, whatever you name it, only to house the new version of your site, it doesn't matter.

Yes, spiders can definitely get into "invisible" subdirectories - I accidentally had a subdirectory called "new" that was a redesign of a large site completely spidered by Google because the google tool bar "told on me".

pageoneresults




msg:1528836
 6:07 pm on Feb 16, 2004 (gmt 0)

User-agent: *
Disallow: /
Should stop spiders that pay attention to the robots.txt file.

I would not recommend the above. Once Googlebot gets a Disallow: / on an entire site, I think it may be a while before you can get a regular crawl. I would definitely create a new sub-directory and Disallow that from the spiders. This way it has no effect on the existing site or spidering of the new site once it goes live.

P.S. mcavill's answer was correct based on your question which was how to prevent all spiders from indexing your site.

In this case though, we only want to prevent them from indexing the new site.

budbiss




msg:1528837
 6:18 pm on Feb 16, 2004 (gmt 0)

I created a sub-domain of the site...ex: xyz.mysite.com So if I put the disallow robots on that, will the effect www.mysite.com?

moltar




msg:1528838
 6:24 pm on Feb 16, 2004 (gmt 0)

No, it will not affect your main site.

budbiss




msg:1528839
 7:47 pm on Feb 16, 2004 (gmt 0)

OK, thanks!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved