Forum Moderators: phranque

Message Too Old, No Replies

Redirect from dev subdirectory

         

ttamniwdoog

6:44 pm on Jul 19, 2010 (gmt 0)

10+ Year Member



Google has indexed news items in my dev site which resides in a subfolder.
How can I redirect links like
http://www.mysite.com/dev/news/webmaster-world-rocks

to
http://www.mysite.com/news/webmaster-world-rocks


I am currently working with this
Redirect permanent /dev/ http://www.mysite.com/


But it's not perfect.
Is there a way to remove the /dev/ from any request that contains it?

jdMorgan

7:39 pm on Jul 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Is there a way to remove the /dev/ from any request that contains it?

That's what your code does.

If there are complications when using that code, please describe them, as otherwise, there's no way to diagnose the problem.

Jim

ttamniwdoog

3:26 pm on Oct 12, 2010 (gmt 0)

10+ Year Member



I am currently working with this
Redirect permanent /dev/ http://www.mysite.com/ 


Not sure how but Google has once again indexed news items in my dev site which resides in a subfolder.

I have a subdomain that points to the subfolder, so the dev site can be accessed via
http://dev.mysite.com/news/webmaster-world-rocks


Is there a way to block Google from indexing my dev environment?

jdMorgan

5:14 pm on Oct 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not clear on how "news" relates to "dev" here, but the simple answer is to put a robots.txt file in the "dev" directory that excludes all robots from all pages. DO NOT forget to remove or update this file when the content is taken live, though!

I may not understand your problem. If needed, please describe what you are trying to accomplish, both from an overall perspective and from a specific URL and specific filepath perspective (providing several different examples is good). Basically, we really don't know how your site is set up, and what you want to accomplish.

Jim

g1smd

6:31 pm on Oct 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Put a robots.txt file in the root of your site disallowing access to the /dev/ folder.

Only the root robots.txt file is actioned.

ttamniwdoog

2:32 pm on Oct 20, 2010 (gmt 0)

10+ Year Member



My production site is live here:
http://www.mysite.com/

And physically here:
/home/mysite/htdocs/


I have my dev site in a subfolder of the production site
http://www.mysite.com/dev

And physically here:
/home/mysite/htdocs/dev

And there is a subdomain for dev.mysite.com which points to
http://www.mysite.com/dev


The issue I've been having is Google has indexed stuff in my dev site.

How can I ensure than Google does not do this?

I am currently working with this in my .htaccess file:
Redirect permanent /dev/ http://www.mysite.com/

And this in my robots.txt file:
# Directories
Disallow: /dev/
# Paths (clean URLs)
Disallow: /dev/

Thanks for your help

jdMorgan

3:34 pm on Oct 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you disallow search engines from fetching the /dev URL-paths (as you have done) then they will never see your redirect, and so will continue to list the /dev URLs in search results.

Remove the robots.txt Disallow so that they can fetch the URLs, receive the redirect, and take action on it.

In order to make /dev accessible through your dev.mysite.com subdomain, you will need a more-sophisticated approach. The usual solution would be to include

Options +FollowSymLinks
RewriteEngine on
#
# Externally redirect only direct client requests for /dev<anything> URL-paths to dev.mysite.com<anything>
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /dev(/[^\ ]*)?\ HTTP/
RewriteRule ^(.*)$ http://dev.mysite.com/$1 [R=301,L]

in /dev/.htaccess, for example.

Such an approach corrects direct HTTP client requests for the 'hidden' dev directory-path, redirecting them to the dev subdomain URL without interfering with your internal rewrite or Alias which maps dev.mysite.com URLs to the /dev filespace.

However, if any additional rewriting is taking place in any .htaccess or config file above this one, it will be necessary to place this rule (with some modifications to the RewriteRule pattern) ahead of those URL-to-filepath rewrites in order to avoid exposing internal filepath info in a URL as the result of this redirect.

General rule: Taking into account all server config files and all .htaccess files in the directory-path to the requested resource, all applicable external redirects must be invoked first, followed by any internal rewrites. Otherwise the rewritten-to internal server filepath will be exposed as a URL to the HTTP client.

Jim

ttamniwdoog

4:34 pm on Oct 20, 2010 (gmt 0)

10+ Year Member



Google currently honors the robots.txt directive to not index the /dev folder.
The issue now is that Google has indexed the site at dev.mysite.com

g1smd

6:13 pm on Oct 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The robots.txt file MUST be in the folder that corresponds to the root of the domain or subdomain that it refers to.

In this case, you will need a robots.txt file at dev.example.com/robots.txt in order to stop the indexing of the dev subdomain.

That same file will also "appear" at www.example.com/dev/robots.txt but it will be ignored for www.example.com accesses as it is not in the root of www.example.com.


Any robots.txt file at www.example.com/robots.txt will apply for accesses to www.example.com only. In this case, if you need Google to "see" the redirects, do not mention the /dev/ folder here.

This has nothing to do with the internal folder structure in your server, and all to do with the URLs of resources as seen by their URLs as used out on the web.

Your original question did not mention the dev.example.com subdomain. It is important to fully specify the requirements, otherwise you'll get the right answer to the wrong question.