homepage Welcome to WebmasterWorld Guest from 54.226.161.112
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
special htaccess redirect to subdomain
special htaccess redirect to subdomain
Das Capitolin




msg:4584284
 7:05 pm on Jun 14, 2013 (gmt 0)

Hello:

I am retiring our old Jooma 1.0 CMS located at document root, and replacing it with the WordPress CMS. Since we have over 100K URLs and plenty of traffic on the old CMS, I have cloned it to a subdomain so the transition would be seamless. What I need to do now is create an .htaccess file at the root that will redirect URLs containing "index.php?option=com_content&task=view&id=123" and forward them to an identical page in the subdomain. I can't just redirect the entire site, because the new WP CMS will be there (using SEF URLs).

Redirect
example.com/index.php?option=com_content&task=view&id=123

To
sub.example.com/index.php?option=com_content&task=view&id=123

This seems like a basic task, but my searches have found nothing relevant. Your expert suggestions would be very helpful. Thank you.

 

lucy24




msg:4584324
 8:51 pm on Jun 14, 2013 (gmt 0)

You just weren't searching in the right places :)

The words you need to look for are

mod_rewrite
%{QUERY_STRING}
%{HTTP_HOST}

constrained to the present subforum. (If you use site search, manually edit the "site:" part after you get your first batch of results.)

I suppose it's no use suggesting that as long as you're cleaning house, you might as well get rid of that superfluous "index.php" in the middle of the URL. Maybe even change the whole thing to something more user-friendly like
sub.example.com/com_content/123

Nope. Didn't think so.

Das Capitolin




msg:4584356
 10:47 pm on Jun 14, 2013 (gmt 0)

Hello:

Thank you for your response, lucy24. It sounds like you know what I'm trying to accomplish, but would like for me to continue searching for the answer. Very well, I can do that.

Perhaps I'm just too green to all of this, but wouldn't changing the 'superfluous' link structure as you suggested break all of my cross-links and back links?

Das Capitolin




msg:4584363
 11:12 pm on Jun 14, 2013 (gmt 0)

Using your suggested key words, I found lots of information. Most of the code examples were all very different, leaving me a bit confused. With some assistance, I was able to get this working in a htaccess tester:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
RewriteCond %{REQUEST_URI} ^/index.php*$ [NC]
RewriteRule ^(.*)$ http://subdomain.example.com/$1 [R=301,L]]


Unfortunately, adding this to my htaccess, even if isolated, caused Apache 500 errors.

Das Capitolin




msg:4584382
 11:47 pm on Jun 14, 2013 (gmt 0)

Problem solved. [R=301,L] should be [L,R=301]. Working code is:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
RewriteCond %{REQUEST_URI} ^/index.php*$ [NC]
RewriteRule ^(.*)$ http://subdomain.example.com/$1 [L,R=301]]

lucy24




msg:4584393
 12:56 am on Jun 15, 2013 (gmt 0)

[R=301,L] should be [L,R=301]

!

That can't possibly have had any effect; each flag is independent and all are evaluated. There's got to be something else.

Matter of fact, your quoted code does show something else: a double bracket after the flag, each time. Is it possible mod_rewrite reinterprets [L,R=301]] as "return a '301]' response?

:: detour for experiments with test site ::

Well, that was extremely interesting so I'm glad it came up. If you have malformed flags ending in a double bracket, all requests-- not only for pages subject to the rule-- will elicit a 500 response. But if the last thing before the double brackets is R=301 (or presumably any other number), the request goes through smoothly and the extra bracket is just ignored.



Now then!

I thought you only wanted to redirect requests for one specific query. The form
^/index.php*$

captures
example.com/index.php

but it also captures
example.com/indexaphp
example.com/index.phpabc

et cetera. The rule says nothing about the query string; it doesn't even check whether one is present at all. By default, any redirect will reappend the existing query. So you don't need to add anything to the body of the rule. If you want to constrain the rule to specific queries, that goes in another RewriteCond.

There are very few situations when you need a positive %{REQUEST_URI} condition. Normally you'd put this into the body of the rule, so mod_rewrite does not need to evaluate conditions at all if the request doesn't fit:

RewriteRule ^/index\.php$ http:/ /subdomain.example.com/index.php [R=301,L]

The only condition needed is the %{HTTP_HOST} one.

wouldn't changing the 'superfluous' link structure as you suggested break all of my cross-links and back links

Depends how much you change it. If you simply pop in the ordinary "index.xtn" redirect, then your URLs change from
blahblah/index.php?more-stuff-here
to
blahblah/?more-stuff-here
Of course you'd have to change your internal links to point straight at the form without "index.php". There is probably already a WordPress option to do this. But if htaccess makes you anxious ;) there is no need to mess with it.

Das Capitolin




msg:4584406
 4:20 am on Jun 15, 2013 (gmt 0)

You are absolutely correct in identifying the 'actual' problem with that code. [R=301,L]] wouldn't work, but [L,R=301]] would, and for the reasons you explained. I've removed the extraneous closing bracket, and it works fine either way.

PS: I never did find the code via search. Thankfully, someone was willing to generate the code for me.

Das Capitolin




msg:4584510
 9:47 pm on Jun 15, 2013 (gmt 0)

Back to square one...

So we discovered that while the previous htaccess code will accomplish the designed task, WordPress adds in competing code that breaks it. Here's my htaccess file:
Options -Indexes
Options +FollowSymLinks
RewriteEngine on
RewriteBase /

# BEGIN WordPress <automatically inserted>
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

#Redirect Joomla links to subdomain
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
RewriteCond %{REQUEST_URI} ^/index.php*$ [NC]
RewriteRule ^(.*)$ http://archive.example.com/$1 [R=301]


The WP code is required for that CMS' permalinks, and has a similar line "RewriteRule . /index.php". If I put the Joomla redirect ahead of the WP rewrite, it breaks WP links. If I place it below, the WP rewrite overrides the Joomla redirect.

At this point, I'm over my head and need professional help (on many levels).

lucy24




msg:4584522
 11:09 pm on Jun 15, 2013 (gmt 0)

Step one: delete the <IfModule... envelope. NOT ITS CONTENTS! Just the envelope itself. You're now left with a continuous set of RewriteRules to put in order. For starters, get rid of the second RewriteEngine On line and both RewriteBase lines. (This line won't do any active harm, but #1 / is the default and #2 the RewriteBase is only used when / is absent from the RewriteRule itself-- which it never will be.)

Redirects need to go before internal rewrites. Ordinarily this would simply mean that you put the Joomla redirect part ahead of the WordPress rewrite part.

But now you've got a horrific problem, because you have TWO DIFFERENT INSTALLATIONS both using the name "index.php". Oops, you're already there:

If I put the Joomla redirect ahead of the WP rewrite, it breaks WP links. If I place it below, the WP rewrite overrides the Joomla redirect.


So the only remaining possibility is to distinguish the two different index.php requests. In your first post you said
index.php?option=com_content&task=view&id=123

and I thought you meant that only one specific query string was involved in joomla. This is apparently not the case-- but if joomla uses a different set of queries than WordPress, it may still be possible to save things. Or, better yet: If joomla uses visible queries in the URL, while WordPress doesn't, the two can easily be pointed in different directions. But now we are in the realm of Things Only You Know.

not2easy




msg:4584527
 11:25 pm on Jun 15, 2013 (gmt 0)

Your Joomla URL rewrites involve specific URLs and you need to confine the rewrites to those specific URLs. It is a bad practice to try to find ready made rewrites, they can't possibly address your specific requirements. Searching and reading should help you understand how to put your requirements in place. You have spelled out the exact needs so well, an understanding of encoding them is all you need. You want to capture an incoming URL request and send it to a different URL. Your rewrite can coexist with the other rewrites your domain needs, just dont try to resolve a specific rewrite with a rule that is busy doing something else.
To help you out, a few tips:
It is a good idea to leave a blank line between different rules for readability.

Most specific rules first.

^ means "starts with" and it is an anchor.

$ means the end of the string of characters you are capturing to apply your rule to.

In between start and end you need to escape non-alpha-numeric entities with a backslash: \

Now, if you read more about mod_rewrite, you should be on your way.

Das Capitolin




msg:4584534
 12:00 am on Jun 16, 2013 (gmt 0)

First, sorry lucy24, I munged my previous copy/paste job and added duplicate set of lines. Not sure if <IfModule mod_rewrite.c></IfModule> were important, but they're gone now.

Each and every single URL on the archived Joomla site consists of "index.php?option=com_content". So instead of calling everything with index.php, I need to find a way to call everything with "index.php?option=com_content" or only "option=com_content". Unfortunately I don't know how to properly express this.

[edited by: Das_Capitolin at 1:01 am (utc) on Jun 16, 2013]

Das Capitolin




msg:4584546
 1:00 am on Jun 16, 2013 (gmt 0)

I think we've figure it out:

Options -Indexes
Options +FollowSymLinks
RewriteEngine on
RewriteBase /

# Redirect Joomla links to subdomain
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
#RewriteCond %{REQUEST_URI} ^/index.php*$ [NC]
RewriteCond %{QUERY_STRING} (^|&)option=com_content(&|$) [NC]
RewriteRule ^(.*)$ http://archive.example.com/$1 [R=301,L]

# BEGIN WordPress
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
# END WordPress

lucy24




msg:4584547
 1:07 am on Jun 16, 2013 (gmt 0)

Edit:
Oops! We were typing at the same time, so what follows is only in response to your second-to-last post.


Didn't I say at the very beginning of this thread that you need to look at %{QUERY_STRING}? :(

Each and every single URL on the archived Joomla site consists of "index.php?option=com_content"

Well, NOW you say so.

First: get rid of the %{REQUEST_URI} line. Never put something in a Condition that can go in the body of the rule.

So your basic rule is

RewriteCond %{QUERY_STRING} option=com_content
RewriteCond %{HTTP_HOST} !joomla-subdomain\.example\.com
RewriteRule ^index\.php http:// {et cetera with the subdomain} [R=301,L]

Now, after this rule has executed, it will still meet your WordPress rule, because the redirected request will pass through the same htaccess again. So the ordinary "specific-to-general" formula isn't enough; you need to put an exclusion in the WordPress rule. EITHER exclude the "option=com_content" query string, OR exclude the subdomain. Going by subdomain makes more sense:

RewriteCond %{HTTP_HOST} !joomla-subdomain.example.com
RewriteCond {wp-stuff-here}
RewriteRule . /index.php [L]

That's the default wp rewrite. It's pretty slapdash, because it doesn't distinguish in any way between requests for pages and requests for other stuff. If someone comes by asking for a picture that no longer exists, you don't want them to end up on your wordpress page do you? Let the server serve up a 404; that's its job.

Most RewriteRules can be constrained to particular types of requests. Here it's especially easy because you don't need to capture. So the package

RewriteRule ^index\.php$ - [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

can almost always be expressed as

RewriteCond %{REQUEST_URI} !index\.php
RewriteRule (^|/|\.html) /index.php [L]

Here I've collapsed two rules into one by making "is not 'index.php'" into a Condition rather than a separate rule. The server ends up doing a little less work, because if the request ends in an extension other than .php it doesn't even have to check whether it's specifically "index.php".



Round about now you're probably starting to get annoyed because nobody will simply tell you the ### rule so we can all get out of here. Here's the boilerplate:

Why We Make You Do It Yourself

There are plenty of forums where you can post a "how-to" question and get a fairly immediate answer. The answer may even be correct. But WebmasterWorld is about teaching you how to do it yourself. That way you can roll your own htaccess-- not just for today's problem but for tomorrow's almost identical one.

Here is the analogy:

Your child's room needs cleaning. You know that you can clean it yourself much faster and better than if you have to stand over your child and force him to do it right. But if you do the "make him get it right" part often enough, you will have raised a child who knows how to clean his room-- and who will some day stand glowering over his own children in the same circumstances.

Some day, someone else will post a question in the WebmasterWorld forums and you'll say "Hey, I know the answer to that one!"


There's a final paragraph involving the dangers of spending a lot of time answering a question from someone with a very low post count, but it doesn't apply to you.

Das Capitolin




msg:4584550
 1:22 am on Jun 16, 2013 (gmt 0)

I sincerely appreciate the guidance. I'm self-sufficient to a point, but need some direction from time to time. Of course, there were some points during this 'project' when I had no clue what you were referring to and became glossy eyed with wonderment and amaze. Thankfully I have an programmer available to help me understand what some of this means, so I can progressively include myself in the conversation.

Next point:
Well, NOW you say so.
Well, actually lucy, I said so in my original post.
What I need to do now is create an .htaccess file at the root that will redirect URLs containing "index.php?option=com_content&task=view&id=123" and forward them to an identical page in the subdomain. Redirect
example.com/index.php?option=com_content&task=view&id=123
To
sub.example.com/index.php?option=com_content&task=view&id=123


At any rate, I thought I said so.

Finally, you gave a lot of code example, and most of it is confusing to me. I suppose that in the interest of learning while fixing, I'll cut to the point and ask if you noticed any problems with the last code I posted (that works, mind you). I wasn't sure if you were referring to previous code, or that one, although it does still have %{REQUEST_URI} in it so I'm thinking the latter.

lucy24




msg:4584553
 4:15 am on Jun 16, 2013 (gmt 0)

Everything in my foregoing post was in response to stuff further up the thread.

I said so in my original post.
You sure did-- but then the next few posted rewrites didn't mention a query string so I thought it was all a red herring :)

# Redirect Joomla links to subdomain
RewriteCond %{HTTP_HOST} ^example.com$ [NC]
#RewriteCond %{REQUEST_URI} ^/index.php*$ [NC]
RewriteCond %{QUERY_STRING} (^|&)option=com_content(&|$) [NC]
RewriteRule ^(.*)$ http://archive.example.com/$1 [R=301,L]

# BEGIN WordPress
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
# END WordPress

This now looks pretty clean. Most importantly it's all in the right order: external redirects first, internal rewrites second. And nicely commented, which is good.

What remains is mostly a matter of coding style.

RewriteCond %{HTTP_HOST} ^example.com$
Oops! Forgot to escape a literal period there. In the present case, not likely to lead unintended consequences, but stick with the habit.

I kinda think it's safer at this point to express the host as "Is NOT the desired subdomain" rather than "IS domain alone". At this point in the code you haven't reached the domain-name-canonicalization redirects-- the rule that grabs any passing request for www.example.com and forces it to use example.com alone. Or vice versa. So if someone came in asking for a joomla page under "www.example.com"-- which would fail this condition-- things could get unattractive.

#RewriteCond %{REQUEST_URI} ^/index.php*$
I only now notice that this condition is commented-out. It should, in fact, go away entirely-- but only because you're replacing it with ^index.php in the body of the rule. Oh, and the * at the end is either a typo or a mistake, since it would mean "zero or more p's at this point" i.e index.ph or index.php or index.phppppppp or...

(^|&)option=com_content(&|$)
See? You HAVE been looking at examples :) That's the ironclad way of spelling out an individual query when you need to make sure it doesn't say "fizzoption=com_contentedly". (This can really occur.)

And then the WordPress part...

RewriteRule ^index\.php$ - [L]
Always leave an empty line after each RewriteRule. mod_rewrite doesn't care, but it helps you and any other passing humans keep organized. In particular it emphasizes that a RewriteCond belongs only with the immediately following rule.

Now, the generic index.php [L] is a good approach if you've got a whole slew of RewriteRules and you want to tell requests for index.php that it's over, you're done with mod_rewrite, get outta here. But if there's only one other rule, you can put index.php into a condition instead.

See previous post for this part; it's about halfway along, probably around the point where your eyes started to glaze over.

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

See above. I tend to start foaming at the mouth when I meet the !-f construct-- especially when it's part of a rule that will execute on every single request, all the time everywhere. The intention is clear, but it's overkill.

Before you go, remember to add another RewriteCond to your final rule, exluding joomla.example.com. Otherwise it will get mixed up in the WordPress rule again. And you definitely don't want "index.php" rewriting to itself. That way lie 500 errors.

Das Capitolin




msg:4584634
 4:16 pm on Jun 16, 2013 (gmt 0)

Good morning Lucy24:

Thank you for your personalized feedback. I felt silly not catching the escaped period, since it was already escaped as an example in the WordPress-generated code. Since WP auto-inserts that portion of code into the htaccess file (unless you've removed permissions like I have), I kept the lines neat and tidy, knowing I wouldn't be altering it. Sure, there could be a empty line after each RewriteRule, but if you know that proceeding RewriteCond's apply to it then it's not a problem. For your Sunday-morning perusal, I offer my htaccess file:
Options -Indexes
Options +FollowSymLinks
RewriteEngine on
RewriteBase /

# Redirect Joomla links to subdomain
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteCond %{QUERY_STRING} (^|&)option=com_content(&|$) [NC]
RewriteRule ^(.*)$ http://archive.example.com/$1 [R=301,L]

# Block libwww-perl (LWP) scripts
RewriteCond %{HTTP_USER_AGENT} libwww-perl.*
RewriteRule .* [F,L]

# WordPress permalinks
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

# Compress file types
AddOutputFilterByType DEFLATE text/plain text/html text/xml text/css text/js text/javascript application/xml application/xhtml+xml application/javascript application/x-javascript application/x-httpd-php application/x-httpd-fastphp application/x-httpd-eruby application/rss+xml application/atom_xml image/svg+xml

# Deny access to .htaccess
<Files .htaccess>
Order Deny,Allow
Deny from all
</Files>

# Deny access to wp-config.php
<Files wp-config.php>
Order Deny,Allow
Deny from all
</Files>

I agree with you about expressing the host as not the desired subdomain, but quite frankly, I put hours into researching this yesterday that have cost me time I couldn't spare (and now I'll be working today and nights this week as a result, as if I believed that you cared). Once my deadline projects are complete, I can revisit with more interest.

The one area I think you might have comment on is the DEFLATE code. Perhaps I got a little creative, but none of it seems to be hurting anything.

Last but not least, my final thoughts are whether all of this should truly belong in an htaccess file (which adds to Apache load), or if I should add it to the Apache config. I will most likely tinker with the config after these projects are addressed, since that seems like a better practice.

I appreciate your assistance, and for breaking me in the hard way.

lucy24




msg:4584669
 7:57 pm on Jun 16, 2013 (gmt 0)

You have access to the config file? NOW you say so. (This time around, a "Now you say so!" is fully justified ;))

What really jumped up and hit me in the face is the

# Block libwww-perl (LWP) scripts

piece. RewriteRules that result in unequivocal F go before all other RewriteRules. No point in redirecting someone who is about to get the door slammed in his face.

Denying access to htaccess is the right idea-- but surely that's in the config file already? Everything in apache is inherited, except mod_rewrite which is a world unto itself.

If you do have access to the config file, my generic suggestion is:

Whenever you're setting up something new, enable overrides for the affected directory and then develop htaccess files where appropriate. After you're absolutely 100% certain that everything is working as intended, shift the htaccess material back to <Directory> sections in the config file, and disable overrides.

The idea of course is that you can't restart the server every time you tweak something; htaccess will let you test on the fly. But in the long run everything runs more speedily and efficiently if there are no htaccess files.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved