Forum Moderators: goodroi
You can have as many domains on one server as you like for all i know. And because of this you can also have many robots.txt files.
You should only have one robots.txt file per domain, though.
A lot of websites are hosted at hosting companies (sharing server space and even IP numbers with other - completely unrelated - sites) and are still doing quite well in the search engines. This is not a problem.
There might be a slight misunderstanding though. SE's generally work "per site" - meaning that if you shut them out from one site this has no effect for your other sites. MSN will not get spidered more because Yahoo excludes spiders in the Yahoo robots.txt.
>> Do they recognize the separate domains even though they are all the same IP address?
The answer is yes. They will not index site (a) just because they can not index site (b) though. On the other hand they will still index site (c) even if they can not index site (b).
/claus
- you'll have to make a "301 redirect" in your .htaccess files to do that, as the robots.txt can only be used to disallow one specific site, but that way you do risk that the bots think that the six redirecting domains are no longer in use.
The easiest would be to simply link from the six well-indexed domains to domain seven. That way the spiders will just follow the link and you don't have to close any domains or redirect.
BTW: I'm getting a bit in doubt, i hope i did understand you properly, but i'm not sure... The seven different domains - they are seven different sites, right?
If it's seven different domain names pointing to the same site (same content), then that's another issue. If you have only one site in terms of content, but seven sites in terms of domain names, then you should really just use one domain name for that site and close the other six.
If, on the other hand, you have seven different sites with different content on them, then closing one or more of the six well-indexed ones will not do any good for the last one. Linking is your best option, just put a standard text-link on (some of the best indexed of) the other domains, and the spiders will follow it.
/claus
If you have multiple domain names pointing to the same directory of content, thus making different (identical) sites, you have a hard time having multiple robots.txt files. The way I recommend doing it is to have your robots.txt file served by a CGI script (or mod_perl module). Have it check the SERVER_NAME environment variable and serve different content based on it. If the client asks for the robots.txt of what you have determined to be the URL you want them to crawl, then serve them up a file that permits them to crawl. Otherwise, serve them up one that has a "Disallow *" line or whatever.
The clients don't know the difference. They just see it as a text file.
www.abc.com/widgets.html is the URL that should be showing up in the SE results, but instead www.def.com/widgets.html is being indexed.
AlltheWeb for instance has completely indexed a couple of the six "extra" domains and not the main site. Since all of my domains have different names it sounds very odd that the 'extra" six domain names should return results about ALL the content on the site...think www.ladiesbloomers.com returning results about sports cars. That's whats going on. So I want to exclude the robots from crawling the 6 domain names and just crawl the one...I hope this clarifies and thanks again for all the help
RewriteCond %{HTTP_HOST} "!^www\.the-right-domain\.com"
RewriteRule (.*) http://www.the-right-domain.com/$1 [R=301,L] This will make sure that all requests for all files on all domains will be shown the same file on "the-right-domain".
The rule instructs the server to do this: if someone asks for a file using another domain than "the-right-domain.com" (eg. www.wrong-domain.com/flowers.html) then the same file should be served, only with "the right-domain.com" first (eg. www.the-right-domain.com/flowers.html)
At the same time it issues a "301" message to browsers and spiders, meaning: "flowers.html" might have been on "wrong-domain.com" once, but now it's moved permanently to "the-right-domain.com" so don't come looking for it on "wrong-domain.com" any more. In stead head directly to "the-right-domain.com" next time.
/claus
Post #8:
You have one site that has different domain names. This means that the root of the site is the same no matter which name you use. The robots.txt must be in the document root.
It's just like a person holding a paper in his hand. It will be the same paper no matter if you use his first name or his last name.
You can of course serve different versions of robots.txt just as suggested by amoore in post #5, but that's a bit more advanced than the above (it requires a few extra lines, and the effect will be the same as with the two lines above)
Actually the effect of the method above is better, as you do have duplicate or wrong content in the SE's already, it seems. The above tells them to take action and correct this (the 301-thing), while the robots.txt just says "you can't go here".
To take advantage of your other names for parts of your site (and not the whole site) is a bit more complicated.
I would advice you to get the duplicate-thing sorted out first, as some SE's can be really harsh on duplicate content. Of course, in this case it's not an attempt to spam them, but they sometimes will think so and penalize the site for it with bad search results, so it is a serious matter.
Then, when the site is indexed as you want it (using just one name for the whole thing to start with) we can return to discussing use of the other names :)
You should put the code above into the .htaccess file in the web root directory of your site - the directory where your "home page" is located by default.
Just as an exercise, here is a robots.txt redirector for multiple domains pointed to the same root directory:
RewriteCond %{HTTP_HOST} ^(www\.)?([^\.]+)\.com [NC]
RewriteRule ^robots\.txt$ /robots_$2.txt [L]
Let's say you have three domains, www.quaffle.com, www.bludger.com, and www.snitch.com. This code will transparently redirect requests for robots.txt in each of these domains to robots_quaffle.txt, robots_bludger.txt, and robots_snitch.txt, respectively. All robots.txt files are assumed to be in the same root directory.
I'll vote for the method posted by claus above, though, as long as you intend to stop using the alternate domain names for anything other than domain-name-branding retention. However, you can/should leave out the double quotes in the RewriteCond:
RewriteCond %{HTTP_HOST} !^www\.the-right-domain\.com
<VirtualHost 206.246.241.117>
ServerName sub.domainname.com
DocumentRoot /usr/local/apache/htdocs/STOCK
</VirtualHost>
<VirtualHost 206.246.241.117>
ServerName www.sub.domainname.com
DocumentRoot /usr/local/apache/htdocs/STOCK
</VirtualHost>
<VirtualHost 206.246.241.117>
ServerName [sub.domainname.com...]
DocumentRoot /usr/local/apache/htdocs/STOCK
</VirtualHost>
Does anyone know what the proper code should be? thanks again!
I guess that by "server" you actually mean an "IP-number" like 127.0.0.0
A server is just a hardware box. It works just like a phone, you can have as many phone numbers on it as you like. You can have fixed phone numbers and you can forward one phone to the number of another.
So, a server (box) can have a lot of IP-numbers. Just like the phone, when you make a call to a number you will reach a person in the other end. This is your content/site.
The DNS has "autodial", so you just have to say the name (domain-name) then the system automatically finds out which number (IP) should be used. Just in like in a family or a business, more persons can share one number (more sites on one IP). And you can refer to this telephone number by each persons name or just by one name (the family/business name)
The relation between names and IP's are what you set up in your host's control panel.
(1) By "pointing to" you set up an "equal sign" - john doe is always at 127.0.0.0.
(2) By "forwarding" you assign one value to another (forward your phone) - jane is at john doe's place.
You can point more than one domain name to one IP (and there can be more than one IP on one server as well). You can also forward more than one domain name to one IP.
Here's the tricky part, but i'll explain using the phone example:
When you "point to" you say: "jane is living at the same place as joe." So, if you ask "jane" over the phone to "go to the living room" you are also asking her to "go to joe's living room". It's the same thing. So, these files are exactly the same when you "point to":
(a) joe-doe.com/the-livingroom/do-something.html
(b) jane-doe.com/the-livingroom/do-something.html
When you "forward" and ask jane to "go to the living room"... well, then she can't! That's because she's not at home, she is over at joe's, and joe's livingroom is not her's. In fact, she can go nowhere inside joe's house - she's simply not allowed to. Now there's only these two cases (a2) and (b2) - case (c2) does not exist:
(a2) joe-doe.com/joes-livingroom/do-something.html
(b2) jane-doe.com/ -> joe-doe.com/
(c2) jane-doe.com/joes-livingroom/do-something.html <-- invalid option, 404 file not found
You can make joe-doe go to his livingroom, but jane-doe can only go to joe-does doorstep, not navigate inside the house.
Wrap-up:
When you point six domain names to the same IP they will share the same .htaccess file and whole file structure. Everything Joe owns, Jane also owns (unless you regulate that in htacces or by other means)
When you forward one domain to another, the domain you forward will not have an htaccess file, as it's "not at home". It's like telling your phone that when you say "Jane" it should call "Joe".
So, the .htaccess code above is needed if all seven domains point to the same IP. If one domain points to one IP, and the other six domains forward to this first domain, then the .htaccess is not needed.
No, you just need one .htaccess file, one IP and six domains pointing to this IP. Then, in the .htaccess file, you could programmatically decide which content should be displayed depending on which domain was used to access the one IP.
This could also be done without the .htaccess, by using serverside scripting (PHP, Perl, Python, ASP, whatever) in stead, but the .htaccess is the "natural place" to make the overall distinctions and it is a quite powerful tool. And of course there's always a whole lot of other possible things you could do.
And then, yes - if you do set up six different sites you might need six different htaccess files anyway (or perhaps even more than six), but that would be for other reasons than to keep the document spaces of the six domains separate.
/claus
Your setup is more complicated than first assumed it turns out now, so the above advice might have to be modified.
What you want is essentially to have the six domains point to individual folders located on the seventh domain, is that right?
This can be done as well, it was just not what i thought you would like to do. The Flash site, where is that one located in the whole setup and which domains should point to it?
Some kind of example would be useful, could you provide a little more detail? - please use example domains, not the real ones.
I'll have to leave the thread now - i'm in Europe and it's 5:30 AM here. I have to get some sleep, but i saw Jim just arrived, so you're in safe hands if he's still around ;)
The reason that you cannot reach your subdomains is probably because they are not registered in your DNS setup. This is not a server-thing. In example (b) below, "www" is not a prefix, it's a subdomain itself, to another subdomain:
(a) subdomain.example.com
(b) www.subdomain.example.com
In your DNS setup you will need to create this domain and point it to your ip (using an A record) or to your domain (using a CNAME record). Then your server conf will take over from there (as the requests for the subdomain hit your ip)
I think you must set this up in your hosts control panel. These panels are different, but your host must have documentation on this. I can see from post #13 that you only need to create them and point them to the IP used in your conf file (using an A record that is).
/claus