Forum Moderators: phranque
Before the sub-domain, my set up looks like this:
h*tp://www.example.com/subdirectory/ where
all the files exist in subdirectory /public_html/abc/
The sub-domain will look like this:
h*tp://subdomain.example/
and all the files exist in the sub-directory
/public_html/def/
What I think will work is
RedirectMatch 301 /abc/(.*)$ htt*://subdomain.example.com/$1
I would place this in the .htaccess file in my
directory /public_html/abc/? Or in the root directory?
Am I on the right track here, or is this train gonna wreck?
What concerns me a bit is all the talk I've read about
redirects and index.html. It's all confusing, but I'm using
index.html?somecode to track PPC's. Should I be concerned
about that with this type of redirect?
Thanks,
grandpa
<edited for example.com>
Very wise to ask first... :)
Let's talk about domains and subdomains first, and worry about subdirectories later. The main question is, do you want to preserve both the (sub)domains represented by the abc and def subdirectories? Or do you want to redirect all traffic for the www/abc domain to the subdomain/def domain?
One thing that may help is to think of redirects as applying to domains and subdomains, and forget about directory structure for now. You can map your URLs to arbitrary filepaths with internal rewrites, and those need not enter into the discussion of what to do with the (sub)domains (yet). You want to get the URLs (which "show" on the Web) in order first, and then map them to the appropriate resources in your file structure. Breaking the problem between those two functions will help keep it simple.
You've got that added layer of PPC tracking complexity, too. So question number two is, do you currently 301-redirect the index.html?somecode requests to index.html in order to avoid "duplicate page" problems? If so, you'll probably want to integrate the subdomain redirect with those PPC redirects and do it all with one redirect to avoid SE confusion and server load issues.
Jim
You want to get the URLs (which "show" on the Web)
in order first, and then map them to the appropriate
resources in your file structure.
That was done yesterday with my host. The DNS records
haven't propogated yet (last time I checked). There's
no mail service, so I assume they only set up an A record.
I should be able to start testing in the next day or so.
I don't plan to go live with this sub domain until
everything is checked, but by far the most daunting task
I face is the redirect. One minute it looks so easy, the
next it's gibberish.
Or do you want to redirect all traffic for the
www/abc domain to the subdomain/def domain?
I intend to re-direct all the traffic from www.domain/abc
to subdomain.domain.com - which is to be pointed to def.
I know this is really basic, but given that I have backlinks
to the existing sub-directory (abc) the .htaccess needs to
be modified in the abc directory? If that's not right
some of my basic understanding is shot...
You've got that added layer of PPC tracking complexity, too.
To answer question number two - I don't 301-redirect
the index.html?somecode requests to index.html in order to
avoid "duplicate page" problems. (I supposed if I had been,
I'd be a little farther ahead of this situation.)
I have two questions... because I do want to learn this
and be able to understand it - will the code that I wrote
even come close to doing this job? Or should I really
spend another night, or two, reading these valuable forums?
<edit added>After reading, re-reading and reading again, I
think I see how the PPC tracking can cause a duplicate
page problem. I don't understand to what extent it could
be hurting me, but could a factor in why I can't climb out
of this post-Fl rut that I'm in... I'm off to look deeper
into this.
Guess that answers my #2.
I found this of interest
[webmasterworld.com...]
</edit>
<edit 2>
I didn't find the answer yet, but I did find
a reason to develop another 100 unique pages
for domain.com This place is like a gold mine
studded with diamonds.
</edit>
The .htaccess file (just below) went into the /acb
subdirectory. I created a pair of html documents, both
named test.html but with different content. One went into
the "abc" directory and the other into the "def" directory
- the reference directory for the subdomain.
Options +FollowSymLinks
RewriteEngine on
RewriteBase /abc
RewriteRule test\.html$ htt*://subdomain.domain.com/test.html
The rewrite works as advertised.
So to expand this for every file in "abc"?
Options +FollowSymLinks
RewriteEngine on
RewriteBase /abc
RewriteRule \.*$ [subdomain.domain.com...]
OK, it almost works. I'm basically just posting my results
here so I can find them later... But what I see happening
is a redirect to the subdomain root. If I try to navigate
to a specific page I'm dropped off at the index page, not
the page I requested.
The log activity reveals:
123.45.67.89 - - [30/Jan/2004:22:11:28 -0800] "GET /subdomain/somedoc.html HTTP/1.1" 302 301 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461; Hotbar 4.3.5.0; .NET CLR 1.1.4322)"
Not sure why I see a 302 and a 301. But let me venture
a guess. The 302 was returned from the original request
and the 301 from the second request? I'm pretty sure
I need to expand the left side so any request is passed
to the right side. Another night of research should get
this part working...
This question is probably beyond the scope of this
discussion, (and tags me a real newbie) but is the
difference between A GET and HEAD or PUT going to affect
how I approach the redirection process?
grandpa
<add>
In the process of getting the subdomain set up
I need to consider any of these requests:
htt*://domain.com/subdir
htt*://domain.com/subdir/
htt*://domain.com/subdir/file.any
htt*://domain.com/subdir/file.any?PPC
<enlightenment> Just sitting here looking at the 4 URL's
above, I think I might understand part of the problem with
PPC tracking. The?PPC has to be stripped out to make the
URI valid for the second request?
</enlightenment>
</add>
This line is malformed:
RewriteRule \.*$ http://subdomain.domain.com/$1
The $1 in the substitution is a back-reference. Unfortunately, it is undefined because it references the first parenthesized subexpression in the pattern, and there isn't one. (See backreferences in the mod_rewrite documentation.)
In addition, you probably want to use a 301-Moved Permanently redirect (I could be wrong). If so, then you need to add the flag [R=301] to the end of the rule. If not, then you should specify a local path only in the substitution, and omit the http://subdomain.com part. Since in this case that would make no sense, I presume you want an external redirect.
And finally, in 99% of all cases, once an external redirect-type rule is applied, you'll want to stop any further rewriting, and go do the external redirect immediately. In that case, an [L] flag is called for. (Actually, this applies to 99% of *all* simple rewrite rules; Once you do a particular rewrite, there is usually no need for mod_rewrite to keep looking at the following RewriteRules for further matches. I always suggest including [L] unless you know of a good reason not to.)
So, the final result would be:
RewriteRule (.*) http://subdomain.domain.com/$1 [R=301,L]
---
Your log is showing you a 302-Moved Temporarily response, which is the default mod_rewrite external redirect (because you did not add the R=301 flag). It also happens that the length of the 302 response is 301 bytes. That's a remarkable coincidence, but that's what it says. :)
Remember that an external redirect response to an initial request causes a new and separate HTTP request from the client. Therefore, you should see a second line in your log with a 200-OK response.
---
I doubt that you'd want to do anything different for GET, HEAD, or PUT, or even OPTIONS or TRACE for that matter. But if you do decide you need to redirect them differently, mod_rewrite can handle it... Trust me. ;)
---
> The?PPC has to be stripped out to make the URI valid for the second request?
Not sure I understand the question, but generally, you want to 301-redirect the PPC requests in order to get rid of the tracking code for two reasons: First, it's already been logged, and you probably don't want the visitor to see it in his/her browser address bar (they're ugly and make the URL "unmemorable"). And secondly, you don't want the visitor to bookmark the URL with the tracking code in place, because if he and 100 friends all use tracking-code-bearing URL bookmarks 3 times a day in the weeks before Christmas, it might throw a monkey wrench into your PPC analysis!
If possible, redirect to the correct domain and strip the tracking code using a single 301 redirect - spiders and visitors with slow modems will appreciate it.
Jim
But, I've also been reading some of the references I found
in this forum... Steve Ramsay's Guide, Learning to Use
Regular Expressions, and of course the Apache pages.
Having some C experience in my background is helping...
While this is going on, I'm giving some thought to how
simple, or complex, my rulesets need to be. As mentioned
previously, the tracking codes need to be addresed. I've
also managed to define a couple of specific requests that
I might want to process. For example, www.domain.com/abc
(without the trailing slash) could be handled one of 2 ways,
I think. Ideally, I would want to redirect that request
to a logon script for the subdomain:
(www.domain.com/cgi-bin/login.cgi),
but it could also redirect straight to the subdomain:
(subdomain/domain.com/). In any event, right now it
returns a 404 since abc is not a valid filename.
<aside>
That address used to resolve to domain.com/abc/index.html,
which always bothered me because that page had no PR.
</aside>
The point is, I think its important to address the rules
I need to apply in the current environments, and with some
thought given to future environments. For instance, I'm
committed (maybe I should be) to the development of
domain.com and www.domain.com as 2 seperate entities -
perhaps to provide unique services. It's not gonna happen
tomorrow, or even next week - but now is probably the
time to avoid shooting myself in the foot later.
grandpa
In my file structure I have a directory called abc.
Here's the .htaccess in that directory.
Options +FollowSymLinks
RewriteEngine on
RewriteBase /abc
RewriteRule (.*) [def.domain.com...] [R=301,L]
What I trying to do is send a request for
www.domain.com/abc to /domain/cgi-bin/that.cgi with
this rule:
RewriteRule ^abc$ /public_html/cgi-bin/that.cgi [R=301,L]
Instead I still get a 404 Not Found.
OK... so I tried this on another server, with another
domain.. one that does not have the directory abc.
The redirect worked like a charm. So I went back to
the first server, and changed abc to test, and it works.
A request for www.domain.com/test gets rewritten to to /domain/cgi-bin/that.cgi.
Is the problem related to having a directory named
abc, and everything in abc is being redirected?
I also tried the rewrite in the root .htaccess. with the
same results, a 404 Not Found.
I got last problem solved, but not without raising
something else. I can definitely see the power of
mod-write at work :-)
I used this rewrite in my root .hatccess
RewriteRule (abc) /cgi-bin/that.cgi [R=301,L]
and I'm given the cgi script. But it *is* appearing to
be from a different location. The browser address
comes back as htt*://abc.domain.com/cgi-bin/that.cgi
I'd like the browser to show the correct address at
htt*://www.domain.com/cgi-bin/that.cgi...
or even better htt*://www.domain.com/abc - which would
match the original request to begin with.
<off to read some more...>
<added>
I got the right URL in my browser now... that was easy.
</add>
<add2>
RewriteRule (abc) /cgi-bin/that.cgi [R=301,L]
This rule was modified slightly to give me the results
I actually wanted.
RewriteRule (abc$) htt*://www.domain.com/that.cgi [R=301,L]
I don't think the 301 is really necessary, but it probably
won't hurt either. But doing this trial and error method
can cause problems I might not catch right away. So I have
a question about syntax. I added the $ to end a pattern
within parenthesis. I'm sure as I dig deeper I'll discover
why I can use a $ without a ^ in front of it. My question
is what document(s) explain the use of (), [], $, ^ and
when they should or should not be used. Or do I just have
to dig it out of the definition for any given directive?
Thanks
</add2>
I think I was looking for this:
htt*://httpd.apache.org/docs/mod/directive-dict.html
If you folks get tired of watching me talk to myself,
feel free to jump in.... :-)
I've .htaccess files set up in 3 locations,
from the filesystem view they are at
/users/domain/public_html/
/users/domain/public_html/abc/
/users/domain/public_html/def/
This root copy is here.
AuthName domain.com
IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*
<Limit GET POST>
order allow,deny
deny from 65.168.224.
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
<Files ~ "^.*$">
order allow,deny
allow from all
deny from env=ban
</Files>
ErrorDocument 403 /host/errdocs/403.html
ErrorDocument 401 /host/errdocs/403.html
ErrorDocument 401 /host/errdocs/401.html
ErrorDocument 500 /host/errdocs/500.html
ErrorDocument 400 [domain.com...]
ErrorDocument 404 [domain.com...]
RewriteEngine on
#I found this FormMail trap in my research, so I thought
#"why not?" It's already banned 3 people in the first
#few hours. Also, when I initially set it up, I had a
#problem serving up my Forbidden Document, since the domain
#had just been banned. I switched back to the hosts
#Forbidden Document and it works fine now.
RewriteCond %{REQUEST_URI} ^/FormMail [NC,OR]
RewriteCond %{REQUEST_URI} ^/FormMail\.(cgi如l如hp) [NC,OR]
RewriteCond %{REQUEST_URI} ^/cgi(\-local吒-bin)/FormMail [NC,OR]
RewriteCond %{REQUEST_URI} ^/cgi(\-local吒-bin)/FormMail\.(cgi如l如hp) [NC,OR]
RewriteCond %{REQUEST_URI} (mail.?form圩orm圩orm.?mail妃ail妃ailto)\.(cgi圯xe如l)$ [NC]
RewriteRule .* /cgi-bin/trap.pl [L]
RewriteRule (abc$) [domain.com...] [R=301,L]
#Then I use these to capture my PPC tracking and strip the
#Query String out of the request. This handles everything
#requested from the default domain.
RewriteCond %{QUERY_STRING} ^ppc=string1 [NC,OR]
RewriteCond %{QUERY_STRING} ^ppc=string2 [NC,OR]
RewriteCond %{QUERY_STRING} ^ppc=string4\+string5[NC,OR]
RewriteCond %{QUERY_STRING} ^ppc=string6[NC,OR]
RewriteCond %{QUERY_STRING} ^ppc=string7[NC,OR]
RewriteCond %{QUERY_STRING} ^ppc=string8[NC,OR]
RewriteCond %{QUERY_STRING} ^ppc=string9[NC,OR]
RewriteCond %{QUERY_STRING} ^ppc=stringA[NC,OR]
RewriteCond %{QUERY_STRING} ^ppc=stringB [NC]
RewriteCond %{HTTP_HOST} ^(.*)$[NC]
RewriteRule ^(.*)$ [%1...] [R=302,L]
Then in my directory abc I placed this simpler version:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /abc
#The first rule redirects traffic to the new subdomain.
RewriteRule (.*) [def.domain.com...] [R=301,L]
#This rule redirects a request to
#www.domain.com/def - which used to be a valid request
#but started returning a 404 after I set up the subdomain.
RewriteRule ^def$ /public_html/cgi-bin/login.cgi [R=301,L]
And finally, in my directory def I placed this version:
Options +FollowSymLinks
RewriteEngine on
RewriteBase /def
#I needed this to catch any PPC tracking to the new
#new subdomain.
RewriteCond %{QUERY_STRING} ^ppc=string3 [NC]
RewriteCond %{HTTP_HOST} ^(.*)$[NC]
RewriteRule ^(.*)$ [%1...] [R=302,L]
*-----------------------------------------*
Amazingly, I almost understand all of this - but I'm still
practicing writing it.
I'd appreciate a once over by a trained eye, there could
well be something obvioulsy wrong that I'd never see.
I don't think there is, I've been on top of my logs for
the last 24 hours.
Some of the other questions I have forming will most
likely be answered when the time is right... after I
figure out what to ask. I'll be hanging out in this
forum for a while. Just the same, any heads-up would be
appreciated.
Grandpa
Jim
I have a few errors in my log that I need to address,
nothing overwhelming that I can see. But there is one
situation still beyond my grasp. At the risk of finding
this thread in another form I'll say the problem is with
my Overture PPC requests. I append a string, which I can
find with my Query-String conditions. But OV appends their
own string which fails my condition check. Further, their
string appears to be somewhat variable, depending on the
original search request. So, I need to write a condition
that matches a rule which is based on a condition of
of a Query-String - actually the first part of a Query-String.
The web view looks like this:
htt*://www.domain.com/file.html?ppc=string?OVR=VARstring
So far I've only figured out this much of it:
htt*://www.domain.com/file.html?ppc=string
And, I suppose I also would want a rule that generally
catches *any* search request with a string attached, and
strips out the search string. I can see where this rule
will need to be placed below the valid ppc ruleset.
I'm not sure of it's relevance to the subdomain redirect
ruleset. Would it be better to process any subdomain
search requests in the .htaccess file for the subdomain?
<I ask this, and the answer seems obvious. But I ask
because right now every subdomain request is already being
redirected with a 301 status - because the new subdomain
address hasn't been propogated thru the SE's.>
grandpa
You might want to post the code and the full query string, so we can discuss specifics.
As far as the order of the subdomain versus query string rewrites, that depends. If you are rewriting subdomains to subdirectories, that is usually done as a 'silent' internal redirect, whereas removing PPC query strings that ar not used by a script is usually done with a 301 external redirect so that the browser is redirected and URL therefore appears 'clean' to the PPC visitor.
Jim
If I read this correctly, it says take any request
that contains the value of "?ppc=string" and
optionally anything after that string that begins
with "?" and rewrite it to my domain index file,
showing the URL [domain.com...]
in the browser navigation bar.
I've been up all night, not going to test this right
away, but let's see if I'm getting any better...
grandpa