homepage Welcome to WebmasterWorld Guest from 54.226.235.222
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 38 message thread spans 2 pages: 38 ( [1] 2 > >     
redirecting directory index to the root for multiple subdomains
mtreasure




msg:4585303
 10:21 am on Jun 18, 2013 (gmt 0)

Hi,

I have seen previous topics re. directing index.html to / , but must admit that I do find it a bit confusing, as there are so many suggestions. Some say add code to the .htaccess file, others to the system config (not sure exactly what that is). Anyway, here's my problem. I mistakenly linked about 12,000 pages as index.html and want to get rid of the problem permanently (I have corrected my links now), redirecting anything that is index.html to the directory before. Here are some examples:

http://www.example.com/index.html to:
http://www.example.com/

http://www.example.com/europe/index.html to:
http://www.example.com/europe/

http://www.example.com/europe/england/index.html to:
http://www.example.com/europe/england/

http://www.example.com/europe/england/somerset/index.html to:
http://www.example.com/europe/england/somerset/

http://www.example.com/europe/england/somerset/bath/index.html to:
http://www.example.com/europe/england/somerset/bath/

http://www.blog.example.com/index.html to:
http://www.blog.example.com/

There are lots of different folders to redirect, so I was hoping to do something simple to tackle every url that is /index.html. Is there something simple that I can do to achieve this, rather than adding a huge chunk of code? Your suggestions (and patience!) are much appreciated. :)

Many thanks,

Martin

[edited by: phranque at 11:13 am (utc) on Jun 18, 2013]
[edit reason] Please Use Example.com [webmasterworld.com] [/edit]

 

phranque




msg:4585317
 11:18 am on Jun 18, 2013 (gmt 0)

if you don't know what the system config file is then use the .htaccess method.

try something similar to this [webmasterworld.com].

mtreasure




msg:4585343
 12:21 pm on Jun 18, 2013 (gmt 0)

Firstly, sorry for not using 'example.com'. I'd not noticed it being edited. My apologies there for causing problems.

So I've added this to .htaccess...
RewriteEngine on

RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www\.example\.com/$1 [R=301,L]

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index(\.[a-z0-9]+)?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [R=301,L]

...and it seems to be working on the whole. Have I done this correctly? I'd like a little reassurance here! :)

This has created a few problems though, e.g.

http://www.blog.example.com/
which when entered with /index.html (http://www.blog.example.com/index.html)
redirects to the actual folder, i.e. http://www.example.com/blog/blogpages/ (wrong)

Also, I have a load of redirects for old subdomains, when when entered with /index.html add the subdomain folders onto the url and cause an error page, e.g.
http://www.bath.example.com/ redirects correctly to:
http://www.example.com/europe/england/somerset/bath/

but http://www.bath.example.com/index.html adds '/pages/bath' to the end (the subdomain folders), which doesn't exist.
http://www.example.com/europe/england/somerset/bath/pages/bath/

I'm so nearly there. Please can you let me know how to tweak this and get it working correctly. It would be amazing.

Cheers.

[edited by: phranque at 11:23 pm (utc) on Jun 18, 2013]
[edit reason] unlinked urls [/edit]

Dideved




msg:4585370
 2:43 pm on Jun 18, 2013 (gmt 0)

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index(\.[a-z0-9]+)?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [R=301,L]

...and it seems to be working on the whole. Have I done this correctly? I'd like a little reassurance here! :)


The first thing you need to know is that you're probably going to get a range of opinions on this topic.

My opinion is that this rewrite rule above is far more complicated than it needs to be. Something as simple as this below seems to work fine.

RewriteRule ^(.*/)?index\.html?$ /$1 [R=301,L]

You'll also get a range of opinions on whether the replacement URL should include the full host name. The reason why you might want to include the host name is for performance. If someone visits an index.html page through a non-canonical host name, then they will likely go through two redirect hops, one to fix the host name and one to strip off index.html. But if you include the full host name in the index.html replacement, then you can accomplish both at once.

However, the reason why you might *not* want to include the host name in the replacement is for ease of maintenance. The canonical host name need only be defined once, plus this rule above can work correctly across multiple host names. And since it sounds like you have multiple host names in play, this may be a significant factor for you.

[edited by: Dideved at 3:48 pm (utc) on Jun 18, 2013]

mtreasure




msg:4585373
 2:56 pm on Jun 18, 2013 (gmt 0)

Thanks so much for replying and helping here. I added this to the htaccess file. Do I have to include 'RewriteEngine on', as it didn't work without this? Do I need to add anything else afterwards, to switch it off?!

RewriteEngine on
RewriteRule ^(.*/)?index\.html?$ /$1 [R=301,L]

However, it doesn't work for the subdomains+index.html (including redirected subdomains as well) in exactly the same as before, i.e. it adds their folders to the end of the url, and therefore then returns an error.

http://www.example.com/europe/england/somerset/bath/index.html becomes
http://www.example.com/europe/england/somerset/bath/pages/bath/
(/pages/bath/ is the location of the old redirected subdomain in this example)

http://www.blog.example.com/index.html becomes:
http://www.example.com/blog/blogpages/

What should I do to prevent this?

Thanks again,

MARTIN

[edited by: phranque at 11:24 pm (utc) on Jun 18, 2013]
[edit reason] unlinked url [/edit]

Dideved




msg:4585398
 3:51 pm on Jun 18, 2013 (gmt 0)

Do I have to include 'RewriteEngine on'


Yup.

Do I need to add anything else afterwards, to switch it off?!


Nope.

However, it doesn't work for the subdomains+index.html (including redirected subdomains as well) in exactly the same as before, i.e. it adds their folders to the end of the url, and therefore then returns an error.


I suspect we're going to need to see your whole htaccess file to understand what's happening.

mtreasure




msg:4585399
 4:07 pm on Jun 18, 2013 (gmt 0)

OK, thanks so much.


This is the main .htaccess file for the whole site:
-----------------
DirectoryIndex index.html
ErrorDocument 400 http://www.example.com/errors/error.html
ErrorDocument 401 http://www.example.com/errors/error.html
ErrorDocument 403 http://www.example.com/errors/error.html
ErrorDocument 404 http://www.example.com/errors/error.html
ErrorDocument 410 http://www.example.com/errors/error.html
ErrorDocument 500 http://www.example.com/errors/error.html

RewriteEngine on

RewriteCond %{HTTP_HOST} ^website\.com$ [NC]
RewriteRule ^(.*)$ http://www\.example\.com/$1 [R=301,L]

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index(\.[a-z0-9]+)?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [R=301,L]
-------------------


This is an example of the .htaccess file for the old subdomains, e.g. http://www.bath.example.com/ :

-------------------
# Permanent URL redirect
Redirect 301 / http://www.example.com/europe/england/somerset/bath/

------------------


There is no .htaccess file for the existing subdomains, e.g.
http://www.blog.example.com/

I guess I could list all of the index.html urls, but it will be over 1,000 and it doesn't seem sensible to stick all that in the .htaccess file to me. I was hoping for a more elegant solution!

I hope that this helps a little. :) Any thoughts here would be hugely appreciated.

MARTIN

[edited by: phranque at 11:25 pm (utc) on Jun 18, 2013]

Dideved




msg:4585431
 6:17 pm on Jun 18, 2013 (gmt 0)

I think it finally clicked for me what the problem is that you're describing. First let's make sure I understand correctly. If, for exmaple, your main lives at /www/public_html, then your subdomain lives at /www/public_html/pages/bath. Is that correct? And the "pages/bath" part is being added to the path when your subdomains redirect?

mtreasure




msg:4585453
 8:03 pm on Jun 18, 2013 (gmt 0)

Yes, that's exactly it! Sorry if I didn't make it clearer. I hope it makes a bit more sense now.

It is difficult to explain like this. Shame you're not sat next to me! Thanks for giving this your time, I am so grateful. I would be so happy to get this resolved.

> the "pages/bath" part is being added to the path when your subdomains redirect?

Yes, with the new .htaccess code that I've added, this is now happening when index.html is added to the end of the subdomain before it redirects. Only then.

Cheers.

phranque




msg:4585506
 11:30 pm on Jun 18, 2013 (gmt 0)

ErrorDocument 400 http://www.example.com/errors/error.html
ErrorDocument 401 http://www.example.com/errors/error.html
ErrorDocument 403 http://www.example.com/errors/error.html
ErrorDocument 404 http://www.example.com/errors/error.html
ErrorDocument 410 http://www.example.com/errors/error.html
ErrorDocument 500 http://www.example.com/errors/error.html


get rid of the protocol and hostname in those ErrorDocument directives or you will get a 302 response instead of the 4XX/500 status code:
ErrorDocument 400 /errors/error.html
...
etc


http://httpd.apache.org/docs/current/mod/core.html#errordocument
Note that when you specify an ErrorDocument that points to a remote URL (ie. anything with a method such as http in front of it), Apache HTTP Server will send a redirect to the client to tell it where to find the document, even if the document ends up being on the same server. This has several implications, the most important being that the client will not receive the original error status code, but instead will receive a redirect status code. This in turn can confuse web robots and other clients which try to determine if a URL is valid using the status code.

phranque




msg:4585510
 11:42 pm on Jun 18, 2013 (gmt 0)

You'll also get a range of opinions on whether the replacement URL should include the full host name. The reason why you might want to include the host name is for performance. If someone visits an index.html page through a non-canonical host name, then they will likely go through two redirect hops, one to fix the host name and one to strip off index.html. But if you include the full host name in the index.html replacement, then you can accomplish both at once.
However, the reason why you might *not* want to include the host name in the replacement is for ease of maintenance. The canonical host name need only be defined once, plus this rule above can work correctly across multiple host names. And since it sounds like you have multiple host names in play, this may be a significant factor for you.


here's another opinion (from the head of google's web spam team) related to reaching to the canonical url in one redirect vs going with "easy maintenance" and multiple hops.
Matt Cutts Answers How Much PageRank is lost through a 301 redirect:
http://www.webmasterworld.com/google/4548792.htm [webmasterworld.com]

basically you will lose about 15% of PR through a single redirect and closer to 28% if you use 2 redirects to get there.

phranque




msg:4585513
 11:45 pm on Jun 18, 2013 (gmt 0)

RewriteRule ^(.*)$ http://www\.example\.com/$1 [R=301,L]


don't escape the periods in the RewriteRule Substitution string - it's not a regular expression:
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

you don't need the anchors in the Pattern since .* is greedy.

lucy24




msg:4585519
 11:57 pm on Jun 18, 2013 (gmt 0)

this is now happening when index.html is added to the end of the subdomain before it redirects. Only then.

Can you backtrack and explain that in different words? On the face of it, you're describing something that shouldn't be happening in the first place: "index.html" has nothing to do with the URL and hence nothing to do with redirection.

Dideved




msg:4585532
 12:43 am on Jun 19, 2013 (gmt 0)

basically you will lose about 15% of PR through a single redirect and closer to 28% if you use 2 redirects to get there.


This would be a valid concern, but I think we may have oversimplified what he said in the video. Matt starts off saying that 10-15% of PR is lost just by following a normal link. He then mentions how 301s used to lose less PR compared to normal links, then later they used to lose more PR. Today, he says, the amount of PR that dissipates through a 301 is identical to the amount of PR that dissipates through a link. He then explicitly says that it neither hurts nor helps to use a 301, and that we should use whatever is best for our purposes.

phranque




msg:4585539
 1:20 am on Jun 19, 2013 (gmt 0)

This is an example of the .htaccess file for the old subdomains, e.g. [bath.example.com...] :
-------------------
# Permanent URL redirect
Redirect 301 / http://www.example.com/europe/england/somerset/bath/


are you still using these Redirect directives?
if so you should not mix mod_alias directives with mod_rewrite directive.

http://httpd.apache.org/docs/2.2/rewrite/avoid.html [httpd.apache.org]:
when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first, regardless of the order of appearance in the configuration file.

phranque




msg:4585540
 1:25 am on Jun 19, 2013 (gmt 0)

This would be a valid concern, but I think we may have oversimplified what he said in the video. Matt starts off saying that 10-15% of PR is lost just by following a normal link. He then mentions how 301s used to lose less PR compared to normal links, then later they used to lose more PR. Today, he says, the amount of PR that dissipates through a 301 is identical to the amount of PR that dissipates through a link. He then explicitly says that it neither hurts nor helps to use a 301, and that we should use whatever is best for our purposes.


you are correct in your understanding and you support my point.
two 301s lose as much PR as following a link through two pages.
the lesson is to link internally to the canonical url and fix externally referred non-canonical requests in a single redirect.

phranque




msg:4585541
 1:29 am on Jun 19, 2013 (gmt 0)

However, it doesn't work for the subdomains+index.html (including redirected subdomains as well) in exactly the same as before, i.e. it adds their folders to the end of the url, and therefore then returns an error.

http://www.example.com/europe/england/somerset/bath/index.html becomes
http://www.example.com/europe/england/somerset/bath/pages/bath/
(/pages/bath/ is the location of the old redirected subdomain in this example)

http://www.blog.example.com/index.html becomes:
http://www.example.com/blog/blogpages/

What should I do to prevent this?


what is the response status chain for those requests?
it looks like you are either getting multiple redirects or an internal rewrite is being exposed by a subsequent external redirect.

mtreasure




msg:4585587
 6:10 am on Jun 19, 2013 (gmt 0)

Thanks for your replies.

The restructure of the site happened about a month ago and was necessary. Interestingly, our pageranks have shot up, quite dramatically in some places. I think that this is down to having a better URL. Whatever happens, happens. It needed to be done and time will sort things out, although it is all very positive at the moment and getting around 60,000 unique visits a day.

What I want to do is make sure that anyone who is linking to any of our pages with index.html at the end, old subdomains or new URLs, gets redirected to a page without index.html . All of the code so far only partly works and doesn't take into account the existing and old, redirected subdomains, as the urls then add on the folders for those and return errors.

All I want to do is direct /index.html to / , taking into account that there are subdomains in place, pointing to different folders. I thought that would be simple! :)


----------
>This is an example of the .htaccess file for the old subdomains, e.g. [bath.example.com...] :
-------------------
># Permanent URL redirect
>Redirect 301 / http://www.example.com/europe/england/somerset/bath/

>are you still using these Redirect directives?
>if so you should not mix mod_alias directives with mod_rewrite directive.

Please be patient with me here, as I'm not sure what you are saying! Yes, these are in place and seem to work fine. Are you saying that I've done something wrong here. If so, please tell me what I should have done.

Thanks.

lucy24




msg:4585596
 6:45 am on Jun 19, 2013 (gmt 0)

There are two basic ways to redirect a request in Apache. (Note lower-case "redirect": we're talking here about a generic function, not a specific command.)

First way uses mod_alias. It is expressed as
Redirect optional-number-here something something-else
or
RedirectMatch optional-number-here something something-else

Second way uses mod_rewrite. It is expressed as
RewriteRule something something-else [letters-and-numbers-here]

In Apache, each module is an island. On every request, mod_rewrite goes through config and htaccess for anything requiring its services; mod_setenvif goes through config and htaccess for anything et cetera; mod_alias goes through et cetera; mod_authz et cetera. And there are a lot of mods, including many that you never need to meet face to face.

If it is not your own server-- and sometimes even when it is-- you have no control over what order the modules execute in. You can only arrange things within a module. A series of RewriteRules will always happen in the order you put them, regardless of whether they come before or after Redirect(Match) in your htaccess file. If all the mods are mixed up together, the RewriteRules will still happen in one batch, the Redirect(Match) in another, and so on.

The only way to be absolutely sure all your redirects (lower case) will happen in the intended order is to put them all in the same module. mod_alias (Redirect, capitalized) is easier to work with but not nearly as flexible and powerful as mod_rewrite (RewriteRule, which can create a redirect). So the moment you have anything requiring mod_rewrite, you need to take any existing mod_alias rules and translate them to RewriteRule form.

If there are lots of them, you can feed your htaccess into a text editor that does Regular Expressions, and convert everything in a couple of global replaces.

mtreasure




msg:4585599
 6:56 am on Jun 19, 2013 (gmt 0)

Thanks Lucy, beautifully explained. So what do you suggest that I do in this instance, so that I can always drop the index.html , regardless of folder structure and subdomains?

Martin

lucy24




msg:4585622
 8:40 am on Jun 19, 2013 (gmt 0)

I think you are well on your way. We just got sidetracked because phranque or someone like him noticed that you've still got mod_alias (Redirect by that name) rules sitting around.

Any rule that begins with the word "Redirect" can be simply translated into an identical rule that begins "RewriteRule". (The opposite is not true.)

:: shuffling papers ::

Here are the RegExes I used.

# change . to \.
^(Redirect \d\d\d \S+?[^\\])\.
TO
$1\\.

# now change Redirect to Rewrite
^Redirect(?:Match)? 301 /(.+)
TO
RewriteRule $1 [R=301,L]
# and
^Redirect(?:Match)? 410 /(.+)
TO
RewriteRule $1 - [G,L]

phranque




msg:4585625
 9:01 am on Jun 19, 2013 (gmt 0)

there are some other reasons when you must use mod_rewrite instead of mod_alias:
- if you need to do any internal rewrites
- if you need to rewrite or redirect based on the hostname
- if you need to rewrite or redirect based on the query string

then reminding you of this...
http://httpd.apache.org/docs/2.2/rewrite/avoid.html [httpd.apache.org]:
when there are Redirect and RewriteRule directives in the same scope, the RewriteRule directives will run first, regardless of the order of appearance in the configuration file.


so that means if you have any internal rewrites that fire and also a (mod_alias) Redirect that fires, the redirect will likely expose something in the internally rewritten url.

in your case, the tl/dr version is if you have any mod_alias Redirect(Match) directives you must replace them with mod_rewrite RewriteRule directives or risk unexpected results.

it's not a "sidetrack".
it's probably the focus of your problem.

phranque




msg:4585627
 9:07 am on Jun 19, 2013 (gmt 0)

So what do you suggest that I do in this instance, so that I can always drop the index.html , regardless of folder structure and subdomains?


you need to fix all the problems and put things in the right order.
=8)

it would be helpful if you answered the questions that were asked:
what is the response status chain for those requests?
it looks like you are either getting multiple redirects or an internal rewrite is being exposed by a subsequent external redirect.


try installing the Live HTTP Headers add-on in a firefox browser and report all the GET requests, the status code for each response, and the Location: header for any 301/302 responses.
all properly exemplified, of course...

mtreasure




msg:4585630
 9:16 am on Jun 19, 2013 (gmt 0)

>it would be helpful if you answered the questions that were asked:

Sorry, I'm doing my best to answer your questions. I'm finding this all a bit confusing. People are saying different things and asking me to read pages and pages of conflicting past threads.

If there is not a simple way to correct this, I'll live with it. Millions of websites must have this problem. However, that's not to say that I don't want to fix it, but I was hoping that someone would just be able to say - 'Ah, what you need to do is this...'! If you have any suggestions about what I should try in my .htaccess file, I'd love to know. I'll happily try anything that you suggest.

phranque




msg:4585632
 9:19 am on Jun 19, 2013 (gmt 0)

eventually, possibly among other changes, you will probably have to change Redirect directives to RewriteRule directives.


but before you do that - answer this question:
what is the response status chain for those requests?

mtreasure




msg:4585643
 9:57 am on Jun 19, 2013 (gmt 0)

OK - I hope that this helps to clarify things (split over several posts as too large). So if I add this to my root .htaccess file:

RewriteEngine on

RewriteCond %{HTTP_HOST} ^website\.com$ [NC]
RewriteRule ^(.*)$ http://www\.example\.com/$1 [R=301,L]

# Redirect index in any directory to root of that directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index(\.[a-z0-9]+)?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)index(\.[a-z0-9]+)?$ http://www.example.com/$1? [R=301,L]

and enter this url:

http://www.example.com/europe/index.html

it works fine, drops the /index.html and I get:


[mod's note: removed requests/responses for irrelevant resources]

[edited by: phranque at 10:13 am (utc) on Jun 19, 2013]
[edit reason] noise [/edit]

mtreasure




msg:4585644
 9:59 am on Jun 19, 2013 (gmt 0)

However, if I enter: http://www.europe.example.com/index.html (a subdomain with a redirect: Redirect 301 / http://www.example.com/europe/ )
it adds on the subdomain folders to the url (in this case 'countries/europe/') and returns an error:


GET /index.html HTTP/1.1
Host: www.europe.example.com

HTTP/1.1 301 Moved Permanently
Location: http://www.example.com/countries/europe/
----------------------------------------------------------
http://www.example.com/countries/europe/

GET /countries/europe/ HTTP/1.1
Host: www.example.com

HTTP/1.1 301 Moved Permanently
Location: http://www.example.com/europe/countries/europe/
----------------------------------------------------------
http://www.example.com/europe/countries/europe/

GET /europe/countries/europe/ HTTP/1.1
Host: www.example.com

HTTP/1.1 404 Not Found
----------------------------------------------------------


[mod's note: removed requests/responses for irrelevant resources]

[edited by: phranque at 10:58 am (utc) on Jun 19, 2013]
[edit reason] noise [/edit]

mtreasure




msg:4585645
 10:00 am on Jun 19, 2013 (gmt 0)

The same for an existing subdomain with no redirect, e.g.
http://www.blog.example.com/index.html
which again adds on the location of the subdomain folders (in this case '/blogpages/') to the url and obviously then returns an error:


GET /index.html HTTP/1.1
Host: www.blog.example.com

HTTP/1.1 301 Moved Permanently
Location: http://www.example.com/blog/blogpages/
----------------------------------------------------------
http://www.example.com/blog/blogpages/

GET /blog/blogpages/ HTTP/1.1
Host: www.example.com

HTTP/1.1 200 OK
----------------------------------------------------------


[mod's note: removed requests/responses for irrelevant resources]

[edited by: phranque at 10:59 am (utc) on Jun 19, 2013]
[edit reason] noise [/edit]

mtreasure




msg:4585647
 10:02 am on Jun 19, 2013 (gmt 0)

Thanks for sticking with me on this and trying to get it working. I do appreciate it so much. As I'm sure you have gathered, I'm a little out of my comfort zone when it comes to Apache and tweaking the .htaccess file in this way.

BTW, I corrected the path for the errors/error.html on the .htaccess, so thanks for that.

phranque




msg:4585655
 10:20 am on Jun 19, 2013 (gmt 0)

RewriteRule ^(.*)$ http://www\.example\.com/$1 [R=301,L]


this IS a sidetrack, but you still need to fix this.
you should not escape the periods in the RewriteRule Substitution string since it's not a regular expression.
also you don't need the anchors in the Pattern since .* is greedy.

RewriteRule (.*) http://www.example.com/$1 [R=301,L]
This 38 message thread spans 2 pages: 38 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved