homepage Welcome to WebmasterWorld Guest from 54.227.182.191
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt - Site in a Directory
Sarao




msg:3796885
 9:35 am on Nov 30, 2008 (gmt 0)

My site is in a directory

www . abc . com / folder

My root redirects to the folder. I want to resitrcit some folders to get indexed by Google. How I can do that?

 

Quadrille




msg:3796909
 11:24 am on Nov 30, 2008 (gmt 0)

I'm sorry, I don't understand the question.

www.example.com/ is probably the most powerful URL you could have, so I'd hesitate to redirect from there to another URL.

Sarao




msg:3796963
 1:30 pm on Nov 30, 2008 (gmt 0)

Actually I was just avoiding the hyperlinks. I have a website, www.example.com which is located in folder, /index, which is redirected to a folder as www.example.com/index through a .htaccess 301. I Am not able to put a working robots.txt

Quadrille




msg:3796968
 2:15 pm on Nov 30, 2008 (gmt 0)

But what are you asking?

Why are you redirecting from www.example.com ?

Many people redirect from www.example.com/index.html to www.example.com/ and it is good practice to have all internal links direct to '/' or to www.example.com/

But I don't think that was your question, was it?

If you want to prevent pages being indexed, you can use the noindex meta tag, direct on each page

Sarao




msg:3797013
 5:17 pm on Nov 30, 2008 (gmt 0)

Thanx for that Meta Tag info. I want some pages not to get indexed. I will do that! Thanx

leadegroot




msg:3797140
 9:02 pm on Nov 30, 2008 (gmt 0)

You can also adjust your 301 so it doesn't redirect for the robots file.
How are you implementing the 301?

Sarao




msg:3801707
 8:12 pm on Dec 6, 2008 (gmt 0)

hi, My site is in a directory because I have three domains hosting on one account.

I think I need to use .htaccess to stop redirecting robots.txt or any other method to avoid some directories to get indexed. Because there are 100+ pages, which are in that directory and it really a heck for me to edit all the pages to add noindex meta tags

Heres my HTACCESS


RewriteEngine On
RewriteBase /
#
# pointing for the domain domain1.com to folder1
ReWriteCond %{HTTP_HOST} ^(www\.)?site1\.com [NC]
ReWriteCond %{REQUEST_URI} !index/
ReWriteRule ^(.*)$ http://www.site1.com/index/$1 [R=301,L]
#
# pointing for the domain domain2.com to folder2
ReWriteCond %{HTTP_HOST} ^(www\.)?site2\.com [NC]
ReWriteCond %{REQUEST_URI} !home/
ReWriteRule ^(.*)$ http://www.site2.com/home/$1 [R=301,L]
#
# pointing for the domain domain2.com to folder2
ReWriteCond %{HTTP_HOST} ^(www\.)?site3\.com [NC]
ReWriteCond %{REQUEST_URI} !page/
ReWriteRule ^(.*)$ http://www.site3.com/page/$1 [R=301,L]

jdMorgan




msg:3801726
 9:15 pm on Dec 6, 2008 (gmt 0)

This approach is unnecessary and trechnically wrong. There is no need to externally redirect and require the client to make two HTTP URL requests to get the content it wants just because your sites are not located in the root directory of your server's filesystem.

The proper approach is to internally rewrite these requests, so that the /index, /home, and /page subdirectories remain invisible to the client (and the user), and to link to the pages within your three sites as if they were all in the root directory:

RewriteEngine On
#
# Redirect to canonical www subdomains
#
# If requested hostname contains "site1.com"
RewriteCond %{HTTP_HOST} site1\.com [NC]
# but is not [i]exactly[/i] "www.site1.com"
RewriteCond %{HTTP_HOST} !^www\.site1\.com$
# then externally redirect to canonical hostname
RewriteRule (.*) http://www.site1.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} site2\.com [NC]
RewriteCond %{HTTP_HOST} !^www\.site2\.com$
RewriteRule (.*) http://www.site2.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} site3\.com [NC]
RewriteCond %{HTTP_HOST} !^www\.site3\.com$
RewriteRule (.*) http://www.site3.com/$1 [R=301,L]
#
# Internally rewrite requests for site1 to subdirectory /index/
RewriteCond %{HTTP_HOST} ^www\.site1\.com$
RewriteCond $1 !^index/
RewriteRule (.*) /index/$1 [L]
#
# Internally rewrite requests for site2 to subdirectory /home
RewriteCond %{HTTP_HOST} ^www\.site2\.com$
RewriteCond $1 !^home/
RewriteRule (.*) /home/$1 [L]
#
# Internally rewrite requests for site3 to subdirectory /page
RewriteCond %{HTTP_HOST} ^www\.site3\.com$
RewriteCond $1 !^page/
RewriteRule (.*) /page/$1 [L]

Making these changes may cause other side-effects on your site, but only because of other changes that you made that were also not necessary.

But you should then be able to put a robots.txt file into each site's subdirectory, and then google and the other search engines will be able to fetch robots.txt for each domain normally.

The corrections above also fix a bad error: The URL-path "seen" by RewriteRule in .htaccess never starts with a slash, but the URL-path seen by RewriteCond %{REQUEST_URI} always starts with a slash. Therefore, the code you posted may have caused redirection looping, until the browser or server reached its maximum redirection limit.

Jim

leadegroot




msg:3801729
 9:19 pm on Dec 6, 2008 (gmt 0)

You have a very unusual hosting setup.

I use what is known as 'reseller' hosting which allows me to host multiple domains on one account. Each domain is stored in folders like 'index' and 'home' and 'page', but the live site doesn't reference those names; the actual domain name is pointed to the folder, not to the area above the folder.

It is very odd to need to use a folder within a domain name.
Its so odd that I suspect that you have misunderstood how your hosting works - do you have these sites live,or are you still working on them?
If you are live and this is indeed how it works, I would sugest you consult your host or perhaps their help files, as I am sure this is a standard question they have to deal with as it is a flaw in their hosting.

Here is the syntax I have on a site I am redirecting to another domain, but want the bots to see the robots:

RewriteCond %{REQUEST_URI} !/robots.txt
RewriteRule ^(.*)$ http://www.site.com/folder/ [R=301,L]

Hope it helps
Lea

jdMorgan




msg:3801756
 10:14 pm on Dec 6, 2008 (gmt 0)

This set-up is not all that odd. In fact it's very common for shared servers on which each hosting client has a dedicated IP address (IP-based virtual hosting -- See Apache documentation). For their convenience, the host points any and all hostname requests to the hosting client's filespace, and leaves it to the hosting client to "map" the hostnames to his/her filespace as desired. This can easily be done with mod_rewrite or with a PHP script.

In some ways, it's superior to the standard Control-Panel method of putting each "add-on" domain into a fixed subdirectory, because it allows the hosting client to easily share files (e.g. scripts) between the domains, unlike the control panel method, which makes this difficult or impossible.

The main problem here is confusion between URLs and files (and their relationship, which is "associative" and not fixed), and between external redirects and internal rewrites.

We've covered these subjects thoroughly in the Apache forum and the Apache section of the WebmasterWorld Library, so I won't repeat all that here.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved