Forum Moderators: phranque

Message Too Old, No Replies

redirect causes subdomain pages to also redirect

301 redirect with folders and subdomains

         

Kallym

7:26 pm on Aug 23, 2009 (gmt 0)

10+ Year Member



Hello,

I recently redesigned my site and moved it to a new server. The new site does not have the same page names as the old site, so I am using 301 redirects.

Here is the line in my .htaccess file:


redirect 301 /contact.php http://example.com/index.php?option=com_content&view=article&id=1&Itemid=5

Note: I know the new urls are not SEF, but that's another matter.

The problem comes in here:
The site is a gallery of art sites. So there are many subdomains and add-on domains under the main site. Many of these also have contact pages named contact.php. But, even when accessing their pages using their domain names (ie. www.example2.com/contact.php) the page is redirected to the main site contact page. This is NOT what I want.

What is the proper way to set this up in the .htaccess file so viewers are directed to the correct contact page (or about.php page). Can this be done in the main .htaccess file? Do I need an .htaccess file in the individual artists folder? Either way, what would the code be?

Additionally, Each artist already has two lines in the .htaccess file, as the subfolder structure is different on the new site.
old structure: example.com/anyartist/
new structure: example2.com/artists/anyartist/

So, to simplify their url I created subdomains for each artists as many do not have their own domain name: anyartist.example.com

Here is the code I'm using to redirect from the old site urls to their new urls:


redirect 301 /anyartist/ http://anyartist.example.com/
## this did not catch urls without trailing slash so added:
redirect 301 /anyartist http://anyartist.example.com/
## some artists now have domain names so their line(s) would be:
redirect 301 /anyartist/ http://example2.com/
redirect 301 /anyartist http://example2.com/

So my main question is, how do I write the redirects to go to the correct contact (for example)page, instead of being redirected to the main contact page? Any advice on making the other 301 redirects better would also be appreciated.

Thank you!
KM

[edited by: jdMorgan at 7:48 pm (utc) on Aug. 23, 2009]
[edit reason] example.com [/edit]

jdMorgan

8:09 pm on Aug 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Use mod_rewrite (RewriteRule) instead of mod_alias (Redirect 301) to rewrite and redirect.

Use a RewriteCond to test %{HTTP_HOST} and inhibit the rule and/or capture the subdomain(s) for re-use in the external-redirect rules.

For "artist" subdomains, use the internal rewrite syntax instead of the external redirect syntax; This will preserve the subdomain-format URL instead of changing the address bar to show the main domain and the /artists/anyartist subdirectories. This will increase the 'branding' of the artist-subdomain, and prevent constantly confusing the search engines by linking to a subdomain, but then redirecting to a subfolder of the main domain whenever that subdomain is requested (potentially very bad for your pages' rankings).

e.g. Requested URL anyartist.example.com/<anything> --rewrite-to-internal-server-filepath--> /artists/anyartist/<anything>

If search engines have already picked up on the /arstists/anyartist subdirectories, add additional rules to externally redirect only direct client requests for example.com/artists/artist/<anything> back to artist.example.com/<anything>

e.g. Requested URL example.com/artists/anyartist/<anything> --301-redirect-to-URL--> artist.example.com/<anything>

Use a RewriteCond to examine %{THE_REQUEST} to be sure that the /artists/anyartist/ URL-path is being requested directly by the client, rather than as a result of your internal subdomain-to-subdirectory rewrite.

Read carefully; There are lots of details in the above. Note especially the distinction between an internal rewrite and an external redirect -- not at all the same thing, and you need some of each. :)

Jim

Kallym

9:31 pm on Aug 23, 2009 (gmt 0)

10+ Year Member



Thank you for your quick response. I understand what you are saying in general terms, but will need to do some research to figure out exactly how to write each line. I really appreciate the information on retaining rankings. I need to digest what you wrote, then try to write it. I will post it here - also, more probably first, any additional questions.

Thank You,
KM

jdMorgan

10:26 pm on Aug 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can likely find examples of all of the steps here in recent threads in this forum.

Try the WebmasterWorld search facility, searching on "rewrite" and "redirect" and include the "RewriteCond" and variable names mentioned above to focus the searches. That plus the words "subdomain" and "subdirectory" should turn up several very useful examples -- and even complete solutions, perhaps.

Specific questions are always welcome -- I answered generally because that seemed most appropriate to your original questions.

Jim

Kallym

11:28 pm on Aug 23, 2009 (gmt 0)

10+ Year Member




I answered generally...

No problem. I do want to understand mod_rewrite better. I've seem to have a mental block when it comes to reg expressions, though.

Okay, for a start (baby steps), I've come up with the following for the contact page to replace the 301 redirect, but it seems to have no effect. I get a 404 error, so must have something wrong.


RewriteCond %{HTTP_HOST} !^www\.example\.com/contact\.php
RewriteRule (.*) http://example.com/index.php?option=com_content&view=article&id=5&Itemid=5 [R=301,L]

ie. if a request for http://example.com/contact.php or http://www.example.com/contact.php comes in it will go to the new page, but will not affect any other contact pages on the site (such as http://anyartist.com/contact.php)

Yes, I've been looking through the forum like crazy and getting lots of tips. I want to reduce the redundancy in my .htaccess file and optimize for SE, but those will be separate posts. :-)

Thank You!
KM

[edited by: jdMorgan at 12:08 am (utc) on Aug. 24, 2009]
[edit reason] example.com [/edit]

jdMorgan

12:24 am on Aug 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Used at the beginning of a mod_rewrite regular-expressions pattern, "!" means NOT. I don't think that was the intended use here.

If the "www." in the pattern is optional, then it must be so designated. See the regular-expressions "?" quantifier (there's a regex tutorial cited in our Apache forum charter).

HTTP_HOST contains only the client-requested hostname, and no part of the URL-path (/contact.php in this example). It may, however, have a period and/or port number appended, so those must be either handled explicitly, or matched by omitting the end-anchor on the pattern.

The variable that a RewriteRule matches to its pattern is always the local URL-path (localized to the current .htaccess file's directory), so make use of that here.

Taking all of that into account, you get:


RewriteCond %{HTTP_HOST} ^(www\.)?example\.com
RewriteRule ^contact\.php$ http://example.com/index.php?option=com_content&view=article&id=5&Itemid=5 [R=301,L]

If it's possible that you may get requests for "www.example.com" and you've decided to use example.com as your canonical domain, then the last redirect in your list should be a catch-all to fix all remaining non-canonical example.com variations:

RewriteCond %{HTTP_HOST} ^(www.)?\example\.com [NC]
RewriteCond %{HTTP_HOST} !=example.com
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]

This takes requests for any variation of www.example.com or example.com that is not *exactly* example.com, and redirects it to that latter canonical domain. It won't affect any other subdomains, but fixes the following non-canonical hostname examples as well as many more variations on uppercase/lowercase not shown:

example.com.
example.com:80
example.com.:80
www.example.com
www.example.com.
www.example.com:80
www.example.com.:80
Example.Com
Example.Com.
WWW.EXAMPLE.COM
WWW.EXAMPLE.COM.:80
etc.

Note that the second rewritecond in this catch-all rule uses an exact (negated) string match, and does not require character-escaping, as does the first rewritecond using regular expressions matching.

As noted in other threads, put all of your external redirects first, in order from most-specific patterns/conditions to least-specific, followed by all of your internal rewrites, again in order from most-specific to least specific. This avoids chained (multiple) redirects and avoids exposing your internally-rewritten filepaths to clients as URLs. This is a rule of thumb, not a law. Occasional exceptions may arise.

Jim

Kallym

3:58 am on Aug 24, 2009 (gmt 0)

10+ Year Member



Thank you for your patience. I am still struggling :-)

I understand about the ! that I used (I know some PHP), but am confused about why it is used in your example:
RewriteCond %{HTTP_HOST} !=example.com

You said the catch-all rule uses an exact (negated) string match, is that referring to this line? what does this line do?

I also understand about the optional www. (I read some posts about being consistent for SEO so will eventually rewrite all to www., if it is not a subdomain, after I solve the current stuff).

I'm also still confused about the order as I am not sure what is external and what is internal. Is the RewriteCond external and the Rewrite Rule internal? What about 301 redirects? You can see my current order below.

I'm not new to web development, but have put off learning more about Apache (not to mention regular expressions) for much to long. I feel like a complete newbie.

I used your example and decided to go with the second example.


RewriteCond %{HTTP_HOST} ^(www.)?\example\.com [NC]
RewriteCond %{HTTP_HOST} !=example.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
RewriteRule ^contact\.php$ http://www.example.com/index.php?option=com_content&view=article&id=5&Itemid=5 [R=301,L]

But, it doesn't seem to be doing anything. I still get a 404 error when going to (any combination of) http://example.com/contact.php. Did I put the last line in the wrong place?

One positive is that the sub-domain and addon domain contact pages are no longer affected.

Here is some of the rest of my .htaccess file if that is helpful:


Options All -Indexes
Options +FollowSymLinks
#
<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName example.com
ErrorDocument 404 /errors/404.php
#
########## Begin - Rewrite rules to block out some common exploits for Joomla
## I didn't include these here to conserve space
#
RewriteCond %{HTTP_HOST} ^(www.)?\example\.com [NC]
RewriteCond %{HTTP_HOST} !=example.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
RewriteRule ^contact\.php$ http://www.example.com/index.php?option=com_content&view=article&id=1&Itemid=5 [R=301,L]
## Do I need to escape anything above? I guess I might as well start adding the www. now.
## Here is an abbreviated list of the redirects I have been using
#
redirect 301 /anyartist1/ http://www.addondomain1.com/
redirect 301 /anyartist2/ http://anyartist2.example.com/
redirect 301 /anyartist1 http://www.addondomain1.com/
redirect 301 /anyartist2 http://anyartist2.example.com/
##redirect 301 /about.php http://www.example.com/index.php?option=com_content&view=article&id=4&Itemid=6
## the above line, as well as the one for contact.php are currently disabled so they don't affect the same pages in subdomains and addon domains.
redirect 301 /support.php http://www.example.com/index.php?option=com_content&view=article&id=5&Itemid=12

Thank You!
KM

[edited by: jdMorgan at 3:06 pm (utc) on Aug. 24, 2009]
[edit reason] example.com, de-linked [/edit]

jdMorgan

4:43 am on Aug 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you are going to use the "www" subdomain instead of example.com as in my example, then you must change the domain redirect rule in a consistent manner. The code above, had it executed, would have created an infinite redirection loop.

Correcting it and adding some comments to clarify:


# If the requested hostname is example.com or www.example.com, or any upper/lowercase
# variation of either, or is in FQDN format, or has a port number appended
RewriteCond %{HTTP_HOST} ^(www.)?\mysite\.com [NC]
# and is NOT *exactly* *precisely* "www.example.com"
RewriteCond %{HTTP_HOST} !=www.example.com
# then redirect the request to www.example.com, retaining the URL-path
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

Now the likely reason that this code didn't execute is that there is no "RewriteEngine on" directive visible in the posted code. You need that directive after the Options directive and before the first RewriteRule.

I will have to go change all of those linked domains names to example.com to comply with our Terms of Service, so that's all for now. More later.

Jim

jdMorgan

3:25 pm on Aug 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The other thing I wanted to mention is that you've still got "Redirect 301" directives in your code. For best results, I recommend not mixing mod_alias Redirect directives with mod_rewrite RewriteRule redirects. Due to the way that Apache processes directives on a per-module basis, mixing these two types of redirects can cause unexpected operation -- Either the mod_alias or the mod_rewrite directives may be processed first, depending on your current server configuration, and if you change servers or your host 'upgrades' your current server, then that execution order might suddenly change, breaking your site. So, it's best to use mod_rewrite for all rewrites and redirects if you use it for any rewrites or redirects.

An external redirect accepts a client-requested URL and 'maps' it to another URL. When the client (e.g browser or search engine robot) requests the 'old' URL, your server code detects it and immediately sends a response back to the requesting client that says, "The resource you have requested has moved. Please ask for it again at this new URL." This terminates the current HTTP transaction, and it is up the the client to begin a new HTTP transaction by requesting the new URL provided in the server's redirect response. Technically, this is optional for the client, but most browsers and SE 'bots will do this immediately (or fairly soon). Note that the result is that in order to get the desired content, the client has to make two HTTP requests, and the server has to handle both of them. So, this slows down the user experience, and your server will log both requests. And whenever the client receives a redirect response, it will update its address bar before issuing the new request, making the new URL visible to the user.

On the other hand, an internal rewrite accepts a client-requested URL and 'maps' it to a different filepath inside the server than would normally be used (in the absence of any rewriting code). So, an internal rewrite might say, "If the client requests the URL-path 'foo.html', serve the file 'bar.php' instead of the 'foo.html' file." So this is a URL-to-filepath translation that occurs only inside the server, and the client is not informed that the request is being served from a non-default filepath. This takes place "behind the scenes," and entirely within the context of the client's original HTTP request.

Jim

Kallym

11:34 pm on Aug 24, 2009 (gmt 0)

10+ Year Member



Your comments were just what I needed! Thank You!

You're correct about the no "RewriteEngine on", actually it was there, at the top of some other code I'm not using, but I didn't see that it had a # in front of it. So, I've fixed that, corrected the code and it's working.

Oops, sorry about using mysite.com instead of example.com.

The Redirect 301 statements can now be rewritten. I've started doing that and they are working except for a double slash before file names - but I will deal with that in a separate post. I'm trying it on my own first.

Thanks again for your detailed explanations and information. I am incredibly grateful! I reread several other related posts in the forum today and understand them much better.

KM

jdMorgan

12:25 am on Aug 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This section

Options All -Indexes
Options +FollowSymLinks
#
<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

can be rewritten as

Options All -Indexes
#
Order deny,allow
#
<Limit PUT DELETE>
Deny from all
</Limit>

simplifying the Options and eliminating the self-overriding first <Limit> section.

Jim

Kallym

4:39 am on Aug 25, 2009 (gmt 0)

10+ Year Member



Thank you! My goal is to get my .htaccess file as streamlined as possible, so appreciate this additional nugget.
KM