homepage Welcome to WebmasterWorld Guest from 54.167.144.4
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
More than one Robots.txt per server?
apphoto




msg:1529097
 10:23 pm on Aug 29, 2003 (gmt 0)

Hi there, I was wondering if anyone has any advice about my situation. I have seven URLs all pointing to one server. I just moved my site to new hosting account that allows me to add as many virtual domains as I want (I assume they are virtual since it's all the same server). I just had to set up directories for each separate URL on the server and change the DNS settings to the new servers (before they were just parked and forwarded.) My question is this...the search engines have all visited each domain separately and basically I have very skewed search engine results because of it. I want to exclude the robots from visiting six of the domains and just have them index one of them. If I put robots.txt files to exclude all robots in these six directories will they not index my entire server? Do they recognize the separate domains even though they are all the same IP address? I don't want to risk not having my site indexed at all, but the search engine results are really innacurate now. Thanks for any advice!

 

claus




msg:1529098
 10:36 pm on Aug 29, 2003 (gmt 0)

Welcome to WebmasterWorld apphoto :)

You can have as many domains on one server as you like for all i know. And because of this you can also have many robots.txt files.

You should only have one robots.txt file per domain, though.

A lot of websites are hosted at hosting companies (sharing server space and even IP numbers with other - completely unrelated - sites) and are still doing quite well in the search engines. This is not a problem.

There might be a slight misunderstanding though. SE's generally work "per site" - meaning that if you shut them out from one site this has no effect for your other sites. MSN will not get spidered more because Yahoo excludes spiders in the Yahoo robots.txt.

>> Do they recognize the separate domains even though they are all the same IP address?

The answer is yes. They will not index site (a) just because they can not index site (b) though. On the other hand they will still index site (c) even if they can not index site (b).

/claus

apphoto




msg:1529099
 11:47 pm on Aug 29, 2003 (gmt 0)

Thanks for the info...I am thinking of just forwarding the robots in the robots.txt file...that way I don't risk losing anything

claus




msg:1529100
 12:01 am on Aug 30, 2003 (gmt 0)

>> forwarding

- you'll have to make a "301 redirect" in your .htaccess files to do that, as the robots.txt can only be used to disallow one specific site, but that way you do risk that the bots think that the six redirecting domains are no longer in use.

The easiest would be to simply link from the six well-indexed domains to domain seven. That way the spiders will just follow the link and you don't have to close any domains or redirect.

BTW: I'm getting a bit in doubt, i hope i did understand you properly, but i'm not sure... The seven different domains - they are seven different sites, right?

If it's seven different domain names pointing to the same site (same content), then that's another issue. If you have only one site in terms of content, but seven sites in terms of domain names, then you should really just use one domain name for that site and close the other six.

If, on the other hand, you have seven different sites with different content on them, then closing one or more of the six well-indexed ones will not do any good for the last one. Linking is your best option, just put a standard text-link on (some of the best indexed of) the other domains, and the spiders will follow it.

/claus

amoore




msg:1529101
 12:17 am on Aug 30, 2003 (gmt 0)

If you actually have different sites, with different content that just happen to be on the same server, then naturally you can have different robots.txt files, just like you have different index.html files.

If you have multiple domain names pointing to the same directory of content, thus making different (identical) sites, you have a hard time having multiple robots.txt files. The way I recommend doing it is to have your robots.txt file served by a CGI script (or mod_perl module). Have it check the SERVER_NAME environment variable and serve different content based on it. If the client asks for the robots.txt of what you have determined to be the URL you want them to crawl, then serve them up a file that permits them to crawl. Otherwise, serve them up one that has a "Disallow *" line or whatever.

The clients don't know the difference. They just see it as a text file.

apphoto




msg:1529102
 12:42 am on Aug 30, 2003 (gmt 0)

Let me clarify...i bought all of these domains intending to point them to specific content on the main site. I realized that if I broke up the content without duplicating the content I would probably dilute my page rank. So now i just forward those 6 names to the same site. the problem is that the search engines produce strange results because of this and some of them have favored the main site and index the other ones, e.g.:

www.abc.com/widgets.html is the URL that should be showing up in the SE results, but instead www.def.com/widgets.html is being indexed.

AlltheWeb for instance has completely indexed a couple of the six "extra" domains and not the main site. Since all of my domains have different names it sounds very odd that the 'extra" six domain names should return results about ALL the content on the site...think www.ladiesbloomers.com returning results about sports cars. That's whats going on. So I want to exclude the robots from crawling the 6 domain names and just crawl the one...I hope this clarifies and thanks again for all the help

apphoto




msg:1529103
 12:43 am on Aug 30, 2003 (gmt 0)

sorry should read "have not favored the main site"

apphoto




msg:1529104
 12:53 am on Aug 30, 2003 (gmt 0)

I just reread your posts...yes, basically there is one web site in terms of content, but it has so much content that I was thinking of breaking it up into smaller sites or just targetting content on the site with different URLs. So what you are saying is that if it is really just one site then you can't exclude robots from searching via the other six domain names?

claus




msg:1529105
 12:57 am on Aug 30, 2003 (gmt 0)

okay, i'm glad you said that. You're about to get into duplicate content trouble it seems. Add these lines to your .htaccess file (a simple ascii-text file with the name ".htaccess") and store it at the domain root:

RewriteCond %{HTTP_HOST} "!^www\.the-right-domain\.com"
RewriteRule (.*) http://www.the-right-domain.com/$1 [R=301,L]

This will make sure that all requests for all files on all domains will be shown the same file on "the-right-domain".

The rule instructs the server to do this: if someone asks for a file using another domain than "the-right-domain.com" (eg. www.wrong-domain.com/flowers.html) then the same file should be served, only with "the right-domain.com" first (eg. www.the-right-domain.com/flowers.html)

At the same time it issues a "301" message to browsers and spiders, meaning: "flowers.html" might have been on "wrong-domain.com" once, but now it's moved permanently to "the-right-domain.com" so don't come looking for it on "wrong-domain.com" any more. In stead head directly to "the-right-domain.com" next time.

/claus


added: The above referred to posts #6 & #7

Post #8:
You have one site that has different domain names. This means that the root of the site is the same no matter which name you use. The robots.txt must be in the document root.

It's just like a person holding a paper in his hand. It will be the same paper no matter if you use his first name or his last name.

You can of course serve different versions of robots.txt just as suggested by amoore in post #5, but that's a bit more advanced than the above (it requires a few extra lines, and the effect will be the same as with the two lines above)


added:

Actually the effect of the method above is better, as you do have duplicate or wrong content in the SE's already, it seems. The above tells them to take action and correct this (the 301-thing), while the robots.txt just says "you can't go here".

To take advantage of your other names for parts of your site (and not the whole site) is a bit more complicated.

I would advice you to get the duplicate-thing sorted out first, as some SE's can be really harsh on duplicate content. Of course, in this case it's not an attempt to spam them, but they sometimes will think so and penalize the site for it with bad search results, so it is a serious matter.

Then, when the site is indexed as you want it (using just one name for the whole thing to start with) we can return to discussing use of the other names :)

apphoto




msg:1529106
 1:50 am on Aug 30, 2003 (gmt 0)

Ok great that was very helpful...but that leads me to ask...is there only one .htaccess file per server? I just changed the dns settings to point all of those 6 "extra" domain names to the same server (before they were forwarded) and I'm not clear if setting up all the domains on the Apache server via the control panel that my host provides is actually like having several small servers each with their own access files...lets say that I wanted to set up 6 completely different sites, theoretically I might need different .htaccess files for all of them, no? I'm not a programmer so apache config is unclear to me at times. Would it be best to just return the domain names to being forwarded and then just add that code you cited to the one .htaccess file? Thanks

jdMorgan




msg:1529107
 2:15 am on Aug 30, 2003 (gmt 0)

There is one .htaccess for each directory and sub-directory in your server account. Some of the complications of .htaccess arise precisely because .htaccess "lives" just between the URI-world and the file-system-world, and has a foot in each.

You should put the code above into the .htaccess file in the web root directory of your site - the directory where your "home page" is located by default.

Just as an exercise, here is a robots.txt redirector for multiple domains pointed to the same root directory:

RewriteCond %{HTTP_HOST} ^(www\.)?([^\.]+)\.com [NC]
RewriteRule ^robots\.txt$ /robots_$2.txt [L]

This assumes that all domains start with an optional "www." and end with ".com", but it can be easily adapted. It is also written for use in .htaccess; For use in http.conf, add a leading slash to the RewriteRule pattern.

Let's say you have three domains, www.quaffle.com, www.bludger.com, and www.snitch.com. This code will transparently redirect requests for robots.txt in each of these domains to robots_quaffle.txt, robots_bludger.txt, and robots_snitch.txt, respectively. All robots.txt files are assumed to be in the same root directory.

I'll vote for the method posted by claus above, though, as long as you intend to stop using the alternate domain names for anything other than domain-name-branding retention. However, you can/should leave out the double quotes in the RewriteCond:

RewriteCond %{HTTP_HOST} !^www\.the-right-domain\.com

Jim

apphoto




msg:1529108
 2:32 am on Aug 30, 2003 (gmt 0)

I added the code to the .htaccess file in the root level of the "main" domain name. I'm still not clear what to do with the directories I set up for the other six domains, do they each need the same .htaccess file? I did forget to mention, however, that I have a cloaked flash site so every SE is served from a different directory than human visitors. I wonder if this will complicate the functioning of the .htaccess code? It works via cgi script. Thanks for all the help this has been very informative...if only the macromedia flash actionscript forum were so fast!

apphoto




msg:1529109
 2:51 am on Aug 30, 2003 (gmt 0)

Since I have the attention of people much more skilled in conf files than I am...I am having a problem that my subdomains can only be accessed by typing in sub.domainname.com, but www.sub.domainname.com or [sub.domainname.com...] don't work. I added the following code to my httpd.conf file but it doesn't work:

<VirtualHost 206.246.241.117>
ServerName sub.domainname.com
DocumentRoot /usr/local/apache/htdocs/STOCK
</VirtualHost>
<VirtualHost 206.246.241.117>
ServerName www.sub.domainname.com
DocumentRoot /usr/local/apache/htdocs/STOCK
</VirtualHost>
<VirtualHost 206.246.241.117>
ServerName [sub.domainname.com...]
DocumentRoot /usr/local/apache/htdocs/STOCK
</VirtualHost>

Does anyone know what the proper code should be? thanks again!

claus




msg:1529110
 3:14 am on Aug 30, 2003 (gmt 0)

Okay, i'll need to clarify a bit, as it's sometimes unclear what exactly is the meaning of "server" and "pointing to" for different hosts.

I guess that by "server" you actually mean an "IP-number" like 127.0.0.0

A server is just a hardware box. It works just like a phone, you can have as many phone numbers on it as you like. You can have fixed phone numbers and you can forward one phone to the number of another.

So, a server (box) can have a lot of IP-numbers. Just like the phone, when you make a call to a number you will reach a person in the other end. This is your content/site.

The DNS has "autodial", so you just have to say the name (domain-name) then the system automatically finds out which number (IP) should be used. Just in like in a family or a business, more persons can share one number (more sites on one IP). And you can refer to this telephone number by each persons name or just by one name (the family/business name)

The relation between names and IP's are what you set up in your host's control panel.

(1) By "pointing to" you set up an "equal sign" - john doe is always at 127.0.0.0.

(2) By "forwarding" you assign one value to another (forward your phone) - jane is at john doe's place.

You can point more than one domain name to one IP (and there can be more than one IP on one server as well). You can also forward more than one domain name to one IP.

Here's the tricky part, but i'll explain using the phone example:

When you "point to" you say: "jane is living at the same place as joe." So, if you ask "jane" over the phone to "go to the living room" you are also asking her to "go to joe's living room". It's the same thing. So, these files are exactly the same when you "point to":

(a) joe-doe.com/the-livingroom/do-something.html
(b) jane-doe.com/the-livingroom/do-something.html

When you "forward" and ask jane to "go to the living room"... well, then she can't! That's because she's not at home, she is over at joe's, and joe's livingroom is not her's. In fact, she can go nowhere inside joe's house - she's simply not allowed to. Now there's only these two cases (a2) and (b2) - case (c2) does not exist:

(a2) joe-doe.com/joes-livingroom/do-something.html
(b2) jane-doe.com/ -> joe-doe.com/

(c2) jane-doe.com/joes-livingroom/do-something.html <-- invalid option, 404 file not found

You can make joe-doe go to his livingroom, but jane-doe can only go to joe-does doorstep, not navigate inside the house.

Wrap-up:

When you point six domain names to the same IP they will share the same .htaccess file and whole file structure. Everything Joe owns, Jane also owns (unless you regulate that in htacces or by other means)

When you forward one domain to another, the domain you forward will not have an htaccess file, as it's "not at home". It's like telling your phone that when you say "Jane" it should call "Joe".

So, the .htaccess code above is needed if all seven domains point to the same IP. If one domain points to one IP, and the other six domains forward to this first domain, then the .htaccess is not needed.



>> lets say that I wanted to set up 6 completely different sites, theoretically I might need different .htaccess files for all of them, no?

No, you just need one .htaccess file, one IP and six domains pointing to this IP. Then, in the .htaccess file, you could programmatically decide which content should be displayed depending on which domain was used to access the one IP.

This could also be done without the .htaccess, by using serverside scripting (PHP, Perl, Python, ASP, whatever) in stead, but the .htaccess is the "natural place" to make the overall distinctions and it is a quite powerful tool. And of course there's always a whole lot of other possible things you could do.

And then, yes - if you do set up six different sites you might need six different htaccess files anyway (or perhaps even more than six), but that would be for other reasons than to keep the document spaces of the six domains separate.

/claus


the above is related to post #10 only, i was writing while the rest got posted.

Your setup is more complicated than first assumed it turns out now, so the above advice might have to be modified.

What you want is essentially to have the six domains point to individual folders located on the seventh domain, is that right?

This can be done as well, it was just not what i thought you would like to do. The Flash site, where is that one located in the whole setup and which domains should point to it?

Some kind of example would be useful, could you provide a little more detail? - please use example domains, not the real ones.

I'll have to leave the thread now - i'm in Europe and it's 5:30 AM here. I have to get some sleep, but i saw Jim just arrived, so you're in safe hands if he's still around ;)


double quotes:
This forum is not very friendly to posting certain types of code. It ate the space before the exclamation mark if i didn't put them around the expression, that's the only reason for them ;)

apphoto




msg:1529111
 4:02 am on Aug 30, 2003 (gmt 0)

Well, I actually think that the code you gave me should suffice to route the spiders to the right directory. I will monitor my logs to see if the spiders continue to request pages using the six "bad" urls. I don't think the cloaking system I have installed will interfere with the .htaccess directive...Using flash requires all kinds of painful work arounds if you expect to get any ranking...and the fact that I have three different sites being served from the same directory makes it worse. I have wanted to separate the three sites and put them into their own directories for a while, but was afraid that I would lose page rank. Thanks again for the help...go get some sleep!

carfac




msg:1529112
 4:27 pm on Aug 30, 2003 (gmt 0)

Hi!

>>> Does anyone know what the proper code should be? thanks again!

For subdomains, I register mine in my DNS server- but I use seperater IP's for each subdomain. I Think for several subs on the same IP you would need to change your rc.conf and hosts.conf...

dave

apphoto




msg:1529113
 4:42 pm on Aug 30, 2003 (gmt 0)

I'm having the problem that I can't reach my subs if I type in www before the name. Without www they work fine. Does anyone happen to know why this is?

claus




msg:1529114
 6:05 pm on Aug 30, 2003 (gmt 0)

>>subs

The reason that you cannot reach your subdomains is probably because they are not registered in your DNS setup. This is not a server-thing. In example (b) below, "www" is not a prefix, it's a subdomain itself, to another subdomain:

(a) subdomain.example.com
(b) www.subdomain.example.com

In your DNS setup you will need to create this domain and point it to your ip (using an A record) or to your domain (using a CNAME record). Then your server conf will take over from there (as the requests for the subdomain hit your ip)

I think you must set this up in your hosts control panel. These panels are different, but your host must have documentation on this. I can see from post #13 that you only need to create them and point them to the IP used in your conf file (using an A record that is).

/claus

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved