Forum Moderators: phranque

Message Too Old, No Replies

SubDomain Rewrites [fails.]

         

JohnRuskin

11:23 pm on Oct 4, 2010 (gmt 0)

10+ Year Member



I want to slide URL references to domain.com/subdir over to subdom/domain.com. I expect that I will do this for a few subdir/subdom combinations, and have seen the suggestion for structuring the internal directory tree as domain.com/SubAll/subdom[1,2,3] -- not yet undertaken.

To start this out, on a test/learning curve, I have created [with a control panel] a subdomain, and external access to the files works by either method. Concurrently, I am rewriting the files [from *.htm to *.html] to account for the location of CSS and FavIcon, on that subdomain, and correct coding errors. Due to existing external references [my error], I want to create redirects of
    1) subdomain/domain.com/*.htm
    to
    subdomain/domain.com/*.html

AND redirects for
    2) domain.com/subdir/*.[htm or html]
    to
    subdomain/domain.com/*.html.


This will solve all variants of duplicitous search engine entries. I can only create/access directory .htaccess files, and not the server variant. I have placed an .htaccess in the subdir, to accomplish the first variant of Redirects(1), above. The Redirects, there, work, but I have turned that off, while trying to get the problems, below, solved.

I have looked at the forum and library entries, as well as the Apache 2.0 rewrite manual; checked into specific forum entries on "subdomains..."; and yet I am stymied.

Before going whole hog on this subdomain venture, to test this Redirect and/or Rewrite out, I created a single new file "index.html", stored together with the old "index.htm", on the subdirectory.

I know that Rewrite works on this server, using entries within a root .htacess, as I can rewrite ![www] to www, there.

The learning curve here has me stymied. I can get the Redirect command to work on a single file, for the redirect in 2) above. [Commented out, below]. However, I can not get a Rewrite to work, using all kinds of coding modifications/tests. I have flushed my browser cache between tests, and even added a no-cache meta tag, just in case, to the stored files.

At this point, in my experiments, the subdomain and the subdirectory both use the same "name" -- I have not yet set up the subAll subdirectory.

    #redirect 301 /subdir/index.htm htttp://subdomain.domain.com/index.html [works]

    RewriteEngine On
    RewriteRule /subdir/index\.htm htttp://subdomain.domain.com/index.html [r=301,L] [doesn't work]

    #RewriteRule /subdir/index\.html htttp://subdomain.domain.com/index.html [r=301,L] [doesn't work]

    #RewriteRule /subddir/index\.(htm|html) htttp://subdomain.domain.com/index.html [r=301,L] [doesn't work]

    RewriteCond %{HTTP_HOST} !^(.*)\.domain\.com$ [NC]
    RewriteRule ^(.*)$ htttp://www.domain.com/$1 [R=301,L]


I have also tried variations of ".?" to create a single rule for the two file types ( htm and html), and other variants, none of which worked. I settled down to the simple single file type Rewrite rules, above, to ferret out what was going wrong, and even those don't seem to work.

Any clues (or other posts) to point out what obvious thing[s] I am missing? I tried putting in the "Options FollowSymLinks", as a bold try, to no avail. And not sure if the Apache manual blocked comment on Per-Directory pattern matching is trying to tell me something. I keep thinking that a Rewrite w/o a conditional line should work.

I have even tried variations of leading/trailing ^ and $, w/ and w/o the leading "/", ...everything plausible to discover some clue why this puzzle remains . . . well . . . puzzling.

Any guidance appreciated. Not manuals or programming shy, but this is the first depth foray into Rewrites. The glory of the first couple of hours of puzzling this out has faded to "it's time to ask for help..."

Thanks, in advance....

=====
Also helpful, a URL pointer for the definitions of the server-variables, listed on the Apache ModRewrite manual page, found at: [httpd.apache.org...]

=============

On a completely different subject....I'm thinking that it would be so cool if there were a rewrite testing engine on the net, into which one could drop some constants [domains, subdirectory names/relations, file names existing/absent], and then create an .htaccess file and have the program [using the same Mod coding as the Apache server] generate what each code line -did-. Even if the program used default names/directories....

jdMorgan

1:43 am on Oct 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't mix RewriteRule with Redirect and/or RedirectMatch directives. Directives are processed in per-module order, and not strictly in the order that they appear in your "code." Each module examines your .htaccess file in turn, executing only the directives that it understands. Therefore, mixing similar directives from different modules means that you do not control the order of their execution. This can have profoundly-bad effects on your server operation and search rankings.

Back on-topic: Your RewriteRules failed because RewriteRule patterns in .htaccess don't start with a slash. See the Apache URL Rewriting Guide at apache.org.

RewriteEngine on
#
# Externally redirect subdomain/domain.com/<something>.htm to subdomain/domain.com/<something>.html
RewriteCond %{HTTP_HOST} ^subdomain\.domain\.com
RewriteRule ^(.+)\.htm$ http://subdomain.domain.com/$1.html [R=301,L]
#
# Externally redirect domain.com/subdir/xyz or <anysubdomain>domain.com/subdir/xyz
# to subdomain.domain.com/xyz
RewriteRule ^subdir(/.*)?$ http://subdomain.domain.com$1 [R=301,L]
#
# Externally redirect all non-canonical subdomain hostname requests
# to canonical "subdomain.domain.com" hostname
RewriteCond %{HTTP_HOST} ^([^.]+\.)*subdomain\.([^.]+\.)*domain\.com
RewriteCond %{HTTP_HOST} !^subdomain\.domain\.com$
RewriteRule ^(.*)$ http://subdomain.domain.com/$1 [R=301,L]
#
# Externally redirect all non-canonical hostname requests to canonical "www.domain.com" hostname
RewriteCond %{HTTP_HOST} !^(www|subdomain)\.domain\.com$
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]

The second rule above won't work properly if you also have code that rewrites the requested URL subdomain.domain.com/xyz to the filepath /subdomain/xzy on your server. This second rule and your internal rewrite rule will countermand each other, causing an infinite loop. If this is the case then so state here, as the code for that case is different.

This code assumes that both your "main domain" and your subdomain map into the same filespace on your server. If this is not the case, you may need to move or modify some of this code.

Jim

JohnRuskin

12:27 am on Oct 7, 2010 (gmt 0)

10+ Year Member



I realized a lack of clarity in my questions, as posed, where I should have written
subdomain.domain.com
, in lieu of
subdomain/domain.com
-- a typo, using slash instead of the correct dot...

That said...thank you for some guidance...
Recognized the rewrite and redirect, modules' issues you referred to, but I still have a few questions.

0a. It looks like using conditions is wiser than plain rewrites....yes?
0b. It appears the the referenced text is only of the part beyond what the condition noticed....what functionality strips the domain & directory info, for the $1 reference within the ^(.+) of the first example?

1. My ISPs control panel redirects subdomain to subdir. So, does your first rewrite for
  • subdomain.domain.com/*.htm --> subdomain.domain.com/*.html

    go in the root directory, or the subdirectory's .htaccess file? I would have guessed the subdir, as the ISPs CP points directly to the subdir, but your example seems to imply putting it all in the root. For me, your example works in the subdirectory, not the root.

    2. What makes canonical? apparently not the WWW presence. Is canon where I have defined a "www" or "subdomain"? Seems like yes. I think I see how you did that example, but translating to English this "
    ^([^.]+\.)*
    " is driving me nuts. Would this be it: A phrase starting with One or More of AnyOneCharacter, followed by a Dot, all of that 1 or more times? Why go to that complex set in the two places you've used it; obviously I am missing something.

    3. In my usage, all the subdomain's files are in the subdirectory, i.e., these already point to the same file:
  • subdomain.domain.com/file.htm
    and
  • domain.com/subdir/file.htm

    In my useage, to this point in time, I have used the same "textword" for the subdir and the subdomain. Is this bad? Also, should I place all subdomain content into directories under a common, intermediate directory? With respect to your comment on your second example, the control panel [not other code of mine] is pointing the subdomain to the subdirectory, as it will for other subdomains. I'm not sure if this changes anything....

    4. Why did you propose the 3rd example? In case someone mistypes subdomainFOO, at least they land somewhere on my site? And, again, would that one be placed in the .htaccess for the root, or the [or one or more] subdirectories being pointed to for one or more subdomains?

    5. In keeping with my prior comments on which .htaccess file to use.... For the forth example, would using the subdomain as a text choice be useful in a root directory .htaccess?,.... as any subdomain reference would point to the subdirectory due to the control panel settings [and I believe that pointing occurs before a read of any .htaccess file?

    Thanks, Jim, for your education and efforts in this forum...
  • jdMorgan

    1:01 am on Oct 7, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    0a. Using the code that's needed to accomplish your desired function is "wiser."
    0b. The domain is never present in the URL-path examined by RewriteRule. Additional confusion as to the purpose of this rule may have been caused by typos in the comments, now corrected (slashes replaced with periods).

    1. See comments following my initial post. If you use control panel, then the rule needs to go in the subdirectory. The rules need to be placed in the .htaccess file(s) which will be traversed when a request arrives at your server. If control panel maps subdomain.domain.com/ to /sudomain/, then the rule dealing with removing /subdomain from the URL will need to go into /subdomain/.htaccess. Other rules, such as the .htm to .html redirect, may be needed for both domains, and therefore may be used in both .htaccess files.

    2. "Canonical" -- "According to accepted practice or convention" or "preferred" in this case. The canonical domain (or any other part of a URL) is simply that form which you prefer. If you choose "www.domain.com" over "domain.com" then "www.domain.com" is your canonical hostname for you main site. Likewise, if you choose "subdomain.domain.com" over "www.subdomain.com" for your subdomain, then that's the canonical hostname for your subdomain-web site. The point is to link to and allow direct access only to canonical URLs in order to avoid duplicate content problems with search engines. So if you request any of these:
    domain.com/
    domain.com./
    domain.com:80/
    domain.com.:80/
    domain.com/index.html
    domain.com./index.html
    domain.com:80/index.html
    domain.com.:80/index.html

    www.domain.com./
    www.domain.com:80/
    www.domain.com.:80/
    www.domain.com/index.html
    www.domain.com./index.html
    www.domain.com:80/index.html
    www.domain.com.:80/index.html

    Then all should be 301-redirected to www.domain.com/
    Otherwise, you allow 15 additional URLs to compete against the canonical URL for links and search ranking.

    3a. See previous answer #2 regarding allowing access using two different URLs. The second rule I posted (to be located in your main domain's .htaccess file, is intended to correct this problem.
    3b. "I have used the same "textword" for the subdir and the subdomain. Is this bad?"
    No, it would cause great inefficiencies to attempt to do otherwise.

    4. Same answer ... #2 above.

    5. If I have a choice, I simply allow all subdomains to resolve to my main .htaccess file, and 'sort them out' myself there. Otherwise, lots of redundant code has to be copied into .htaccess files in each subdomain's filespace -- and it then has to be maintained.

    I hope I got most of them, if not all...

    Jim

    JohnRuskin

    1:42 am on Oct 7, 2010 (gmt 0)

    10+ Year Member



    I recall seeing some comments in the forum, [yours?], about putting subdomains into subdirectories, with a single intermediate directory between them and the root. I assumed, therefore, that that would easiest.

    Regarding duplicate SE scans and etc.....if all files are in the root directory, and all subdomains point there, then these pointers to the same file result into two different SE items, no?

    subdom.domain.com/file.html

    and
    domain.com/file.html

    I thought that wasn't a good idea.? Or are the odds of these mistypes so low, that the ease of maintenance you discussed primes?


    ---
    what does this mean...and why did you use it?
  • ^([^.]+\.)*



    ---
    For some reason, this does not work
    RewriteRule ^subdir(/.*)?$ [subdomain.domain.com$1...] [R=301,L]
    when I replace the subdir/subdomain and domain names, and into the file. I placed this into the root .htaccess

    This is the current variant [i have tried all sorts of tests to decipher where the error is...as the extra "#"/commented out contents sort of shows...]


    AddHandler server-parsed .html
    AddHandler server-parsed .shtml
    AddHandler server-parsed .htm

    RewriteEngine On

    # Externally redirect domain.com/subdir/xyz or <anysubdomain>domain.com/subdir/xyz
    # to subdomain.domain.com/xyz
    #RewriteCond %{HTTP_HOST} !^(.*)\.complianceofficer\.com$ [NC]
    #RewriteCond %{HTTP_HOST} ^www\.complianceofficer\.com [NC]
    #RewriteRule ^lighterside(/.*)?$ http://lighterside.complianceofficer.com/$1 [R=301,L]
    #RewriteRule ^lighterside/index.html$ http://lighterside.complianceofficer.com/index.html [R=301,L]

    RewriteRule ^lighterside(/.*)?$ http://lighterside.complianceofficer.com$1 [R=301,L]

    # Externally redirect all non-canonical hostname requests to canonical "www.domain.com" hostname
    RewriteCond %{HTTP_HOST} !^(www|lighterside)\.complianceofficer\.com$
    RewriteRule ^(.*)$ http://www.complianceofficer.com/$1 [R=301,L]
  • jdMorgan

    2:37 pm on Oct 7, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Regarding duplicate SE scans and etc.....if all files are in the root directory, and all subdomains point there, then these pointers to the same file result into two different SE items, no?

    We are discussing server configuration here, and everything is in the details. Also, I don't tend to type anything that's not important. If you re-read what I wrote above, you will discover a dependency clause which states, "and them sort them out." The meaning of this is that you could fully replace the code created by control panel with your own code to point domains and subdomains to subdirectories using an internal rewrite (as opposed to an external redirect.)

    As for your code "not working," the most likely reason is that you've put it into the wrong .htaccess file, or that you've failed to adapt the code to the context of that .htaccess file.

    The redirection of domain.com to www.domain.com and of www.domain.com/subdomain/xyz to subdomain.domain.com/xyz must be done in your "main domain's" .htaccess file, while the redirection of www.subdomain.xyz.domain.com to subdomain.domain.com and of subdomain.domain.com/subdomain/xyz to subdomain.domain.com/xyz must be done if your subdomain-subdirectory's .htaccess file.

    .htaccess is a "per-directory" server configuration file, and each rule must be adapted to the filepath location in which is is to execute and match. Specifically, Apache will strip off the path to the current .htaccess file's directory before RewriteRule examines that path. So if your code is in /.htaccess, then no URL-path examined by RewriteRule will start with "/", and if your code is located in /abc/.htaccess, then no (correct) URL examined by RewriteRule will start with "/abc/" -- only the "localized" URL-path will be visible.

    -----

    what does this mean...and why did you use it?

    The regular expressions pattern ^([^.]+\.)* means "starting at the beginning of the string being examined, match one or more characters not equal to a period, followed by a period, and match this entire sequence zero or more times."

    In the particular place I used this in the code I posted, its intent is to match zero or more sub-subdomain labels preceding your actual subdomain label.

    The sequence of the two RewriteConds followed by the RewriteRule then takes on the meaning, "If the requested hostname contains "subdomain" and "domain.com", with any number of other sub-subdomain labels before or after "subdomain" AND if the requested hostname is NOT exactly "subdomain.domain.com", then redirect to exactly "subdomain.domain.com", keeping the originally requested URL-path (i.e. redirect to the originally-requested "page," but in the canonical subdomain).

    You will see many instances on the Web where the pattern ".*" is used instead of this more-elaborate negative-match pattern construct. The advantage of the more-elaborate pattern is that it is far more efficient, because is explicitly declares when we want to "quit matching." In contrast, the greedy ".*" subpattern will initially match anything and everything in the string being examined, and the matching engine will then have to back off one character at a time from end of that string, trying to match the remainder of the pattern. Therefore the number of matching attempts is proportional to the length of the string being matched in the simple case where one instance of the ".*" subpattern is used.

    In cases where multiple ".*" subpatterns occur in a pattern, the number of matching attempts grows geometrically with the length of the string being matched. In other words, if the string is 30 characters long, then up to 900 matching attempts could be required. By contrast, the use of a properly-constructed negative-match pattern can often reduce the number of required matching attempts to one -- or just a few.

    You will note on review of the URL Rewriting Guide at Apache.org that among the many, many code examples provided, the ".*" subpattern is conspicuous by its absence. It should be reserved for use only when you truly wish to match "anything, everything, or nothing" and is therefore rarely used. The occurrence of multiple ".*" subpatterns in a pattern usually indicates code written by a beginner or by a lazy or incompetent coder -- or by someone trying to clean up after one of these. It often also indicates not only that the code is poorly designed and implemented, but also that the "URL-system" it is intended to support may also be poorly designed.

    -----

    Back to "did not work" :

    Please specify the location of the code being tested, as well as the URL used to test it, the expected results, the actual results, and any comments on the difference between the two in every test case. Otherwise, we play the role of the mechanic when a car is brought in, and the owner leaves in a rush, saying only, "Please fix it." But while the mechanic can and does charge for his time to fully-diagnose that the windscreen-washer tank is empty (after running a full three-hour suite of engine diagnostics), we don't have that option here...

    Jim

    JohnRuskin

    6:34 pm on Oct 7, 2010 (gmt 0)

    10+ Year Member



    Thanks, Jim....a well thought out discussion on those questions... I would add some of it to the forum/library!

    -
    Ah. Sort out, as in you use your own code to replace what the CP could do. Not meaning that all the files are located in the root...ok.
    --
    Ah. [^ ] is the "not" coding -- I was looking for the "!". Now I understand that bracket construct.

    ------
    ^subdir(/.*)?$
    I read this as: Text, with "subdir" starting at the start of the string, followed by a phrase which is "a slash and 0 or more characters", which phrase occurs once or not at all, all the way to the end of the string. And it is that phrase which is used as rewrite $1. Idea: Anytime "subdir/" appears it is assumed the file is in the subdomain, instead. Did I get that right?

    ----
    As to the last...

    As mentioned, the .htaccess code is in the root. I used as the sample, the second example:
    RewriteRule ^subdir(/.*)?$ http://subdomain.domain.com$1 [R=301,L]


    Ending up with the requests, in the above post, but in pertinent part, it became this rewrite request, when sub'ing in my subdom/subdir names:
    RewriteRule ^lighterside(/.*)?$ http://lighterside.complianceofficer.com$1 [R=301,L]

    The entire root .htaccess is in the above post

    I tested using this URL:
    http://www.complianceofficer.com/lighterside/index.html


    And then expected to land on [display on the browser]:
    http://lighterside.complianceofficer.com/index.html


    Instead, I land on the subdir version unchanged, rather than the subdomain version. There was no indication [with HTTpFox] that any redirect headers were issued.

    ----
    I understand, now, the stripping that you discuss, with the references.

    I think, however, my confusion comes in/around this: If, because of the ISP control panel,
    subdomain.domain.com
    points to this directory:
    domain.com/subdir


    Then, I take it the .htaccess in the subdirectory [and home to the subdomain] must massage the URLs which are in this form:
    subdom.domain.com/form.htm


    And, the root .htaccess massages these requests:
    domain.com/subdir/file.htm
    , and
      then
    the /subdir/ .htaccess massages them next.

    Do I have that right? This CP vs. no CP thought process makes me suspicious as to why the rule doesn't work, but I can't figure it out.

    -----
    I see the sense of not using the control panel....

    jdMorgan

    12:26 pm on Oct 8, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    One thing that may immensely clarify this for you is that the function here is a URL-to-filepath rewrite. At the hostname level (URL), subdirectories are meaningless, and at the directory filepath level, hostnames are meaningless. That is, out on the Web, no-one knows or cares that /subdomain is a subdirectory in the same server/filespace as that assigned to domain.com. And inside the server, we refer to things only as filepaths, and the domain or subdomain isn't meaningful beyond the fact that "the request arrived here at this server."

    So, a request arrives at this server for domain.com, and the server executes /.htaccess

    Or a request arrives for domain.com/subdomain/ and the server executes /.htaccess, and then does a redirect to subdomain.domain.com/

    And if a request arrives for subdomain.domain.com, then the CP-generated code passes that request to to /subdomain/.htaccess

    And if a request arrives for subdomain.domain.com/subdomain/xyz, and the CP-created code sends that request to /subdomain/.htaccess, then we want it to redirect to subdomain.domain.com/xys so as to remove the /subdomain path-part from the URLs used out on the Web.

    Anyway, try to keep URLs and filepaths as separate and distinct concepts in your head, as they are not at all the same thing. URLs have scope "there, out on the Web," while filepaths are used "here, inside this server."

    Apache mod_rewrite's two primary functions are to do URL-to-URL re-mapping (external redirects) and URL-to-filepath re-mapping (internal rewrites). These are different functions invoked by different RewriteRule syntax. External redirects send a response back to the client saying, "That resource has moved, please ask for it again using this new URL." Internal rewrites simply modify the filepath used to "find" the requested resource, and the client is not informed.

    ---

    Your test of URL http://www.complianceofficer.com/lighterside/index.html should have executed your rule
     RewriteRule ^lighterside(/.*)?$ http://lighterside.complianceofficer.com$1 [R=301,L] 

    in /.htaccess, and should therefore have been redirected to
    http://lighterside.complianceofficer.com/index.html

    So either the code didn't function properly, or this request never reached your server. While the former may seem more likely, it's actually the latter case I'd suspect; Make sure that you delete your browser cache entirely before testing any new code on your server. Otherwise, your browser may show you stale previously-cached pages and server responses, and no request will be sent to your server. If no request is sent to your server then obviously no server-side code can have any effect.

    As for the code not functioning properly, don't be afraid to just comment everything out and then do a quick test with a rule like
     RewriteRule ^foo\.html$ http://www.google.com/ [R=301,L] 

    as a basic "sanity check."

    Jim

    JohnRuskin

    3:57 pm on Oct 8, 2010 (gmt 0)

    10+ Year Member



    thanks, Jim...Another good review...

    Being the old tyme programmer i am, I've tried deadHeading instructions to ferret out the failure point.... this time, a novel tact discovered this weirdness...

    The rewrite engine works.

    The root .htaccess will redirect a root
    index.htm
    to whereever, using this as the match term: index\.htm$

    But there is no combination of anything I've found that will let me place a command in the root .htaccess that will move
    /lighterside/index.htm
    anywhere.

    It is almost as if the use of a subdirectory name ("lighterside"), or the slash ("/"), or their combination, make the match unmatchable. It is acting as if, and this I also find unlikely in my heart, but it is almost as if the rewrite engine is not receiving the "lighterside/" as part of the path, and so it can't match it. Would there be a reason why the server is doing that?

    I've been using this rule:
    RewriteRule [matchRule] http://lighterside.complianceofficer.com/index.html [R=301,L]


    Where [matchRule] is:
    index\.htm$ [works]
    lighterside/index\.htm$ [fails]
    ^lighterside/index\.htm$ [fails]
    I've messed with other twists to the matchRule phrase, just to see what happens, like adding a leading "/" [yes...I remember, but I was just rambleTesting...]

    I even tried changing from wireless card to a WiFi connect, on the thought that the wireless was caching something.

    Cleared the cache; watched HttpFox header reports; tried ShiftReload. Even went to IE and cleared its cache [as little as I've used it over the years.

    BUT.... The same rule WORKS in the /lighterside/ .htaccess.

    I thought that the server module would first look to the root .htaccess, and then onto the subdirectory. Is it possible that the server rewrite module is configured NOT to start at the root, and work down to the subdirectory?

    I'm stymied....thoughts ?

    jdMorgan

    7:15 pm on Oct 8, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Perhaps "RewriteOptions Inherit" is not set on this server. While you're looking at RewriteOptions, you may also want to set MaxRedirects to a more reasonable number than the default of 10.

    If this is not a mod_rewrite inheritance issue, then the only other thing I can think of is that the control-panel-generated code may also include provisions for the subdomain-subdirectory path, and may be changing the request URI before your code even sees it.

    Jim