Forum Moderators: open
we are thinking in get several subdomains like this:
http://keyword.mysite.com
But we would also like to mantain the same file in the root directory of my site.
http://www.mysite.com/keyword/index.html
Some internal links would point to the subdomain and others to the folder
Would it be considered spam opr is it right?
If it is, is there any way to avoid it?
Sorry - me again with another Q.
Please clarify
Therefore, each subdirectory which "represents" a subdomain must have its own robots.txt- are you saying you create * separate * robots.txt files for each sub-directory or is there just one in the root that disallows each of the sub-directories like this:
User-agent: *
Disallow: /word1/
Disallow: /word2/
If there are multiple robots.txt files where do they go? I understood it is in the root & so presumably you can only have 1.
Cheers
Yes, that is what he is saying. Excluding a sub directory in the robots.txt file found at [yourdomain.com...] will only prevent indexing if you provide links to that sub directory.
So lets say you had a subdomain setup that looked like [apples.yourdomain.com...] and you had a page in that subdomain called washington.html that you didn't want indexed.
To exclude it, you would place a robots.txt in the sub directory [yourdomain.com...]
That RBT file would contain
User-agent: *
Disalow: /washington.html
When a bot requests a link containing a new hostname, it will request a RBT from that url. It is no different than it grabbing a link to a completely different domain. There is no way the spider can tell the difference.
Yes, I am saying to treat those subdirectories (which each contain a subdomain) as if they are really different domains. Put a separate robots.txt in each one to control robots visiting that subdomain.
WebGuerrilla,
Thanks for a "second opinion" here - It's possible that egomaniac has some other rules which are bypassing the per-subdomain rewrites under certain conditions, but based only on what was posted, I was wondering if I was missing something!
Jim
Webguerrilla, I think that you are right here on this one. I forgot when I made my posts that I had yet to link up the subdomains I thought I had been blocking.
Unique robots files for each subdomain probably is the right way to go. You guys seem to understand this better than I do. I only could report what I have done. That doesn't mean I did things properly.
It looks like I need to do some robots file reconfiguring :)
The thing that's a bummer is that Google has indexed at least one page from my main domain that IS correctly disallowed in my robots file. I haven't figured that one out yet. But that's another thread.
-egomaniac