I have a website.com and I have a folder called /sites. In this /sites folder I have about 5 other websites.
I would like to block access so when www.website.com gets spidered, it does not go into the /sites folder as part of this website. I do however want to be able to have www.website2.com be able to be spidered as this content would be located in www.website.com/sites/website2.
Is there a way to have search engines not get the content in the sites folder, but still allow the other website folders to be indexed for those particular websites?
In the normal way of doing things, the content in the separate folder would have its own domain name, and the answer below assumes that is true.
Usually you would use .htaccess for this, redirecting requests for (www.)example1.com/site/example2 over to www.example2.com/ with a site-wide 301 redirect that preserves the rest of the requested file path information in the redirect.
You could put a robots.txt file in the root of example1.com something along the lines of: