Forum Moderators: phranque

Message Too Old, No Replies

subdomain name -> var in query string

slightly different needs than other examples I've seen

         

akimbo_sf

4:08 am on Mar 22, 2006 (gmt 0)

10+ Year Member



Hi --

I've spent all day searching for, studying, and experimenting with various mod_rewrite solutions to my issue, but none fit exactly. To frame the issue:

I'm building a commercial real estate site using PHP. The site doesn't contain many pages -- Listing index and detail pages, About, and so on. The business will bring on agents, each of which will have their own subdomain. Manually creating subdomains containing the same set of pages seems impractical -- the scripts would be identical for each subdomain, with just the one solitary variable (subdomain name) governing all content displayed on the pages.

So I'm looking for a way to grab the subdomain name from the rewritten URL and stick it in the URL as a variable, no matter what page is loaded, while keeping the rest of the URL intact. Examples:

[johnqagent.mysite.com...] -> [mysite.com...]
[johnqagent.mysite.com...] -> [mysite.com...]
[johnqagent.mysite.com...] -> [mysite.com...]

Concerning that last example -- I already have a rewrite rule in place that is successfully outputting listing detail pages using the rewritten URL (/listings/0330). I'm just not sure how that complicates rewrite rules designed to create virtual subdomains.

Here was my best shot at starting a solution today:

RewriteEngine On
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^(www\.)?mysite\.com [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.]+)\.mysite\.com [NC]
RewriteRule ^$ [mysite.com...] [L]

Ideally, this would take any URL with a subdomain other than "www" and display a one-line script I wrote to output whatever GET vars it found in the URL. This only worked if I typed in "<subdomain>.mysite.com"; referencing any other page, as in "<subdomain>.mysite.com/login.php", would ignore the rewrite rule and send me to "www.mysite.com/login.php".

I should also note that writing

RewriteRule ^(.*)$ [mysite.com...] [L]

on the advice of numerous posts I found, gave me a 500 error.

I'm comfortable with regex but I fear the complexity of what I'm asking for -- coupled with the possibility that this is a server config issue as well as a coding issue -- is over my head. Any help is welcome.

Thanks in advance.

jdMorgan

6:53 am on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



akimbo_sf,

Welcome to WebmasterWorld!

Does this recent thread [webmasterworld.com] help?

Jim

akimbo_sf

3:51 am on Mar 23, 2006 (gmt 0)

10+ Year Member



Hi Jim --

Thanks for your reply. I reread the Apache rewrite guide and the doc on the engine itself, and did a search for my specific issues on this site and others. The thread you sent me to provides very specific code that I'm not exactly looking for. praveenkumar seemed to need his subdomain written as a GET var on a specific page (index.php), whereas I'm looking to have that GET var appended to every page. I tried to adapt his code but it's not yielding anything useful.

I'll start small. First I want to write code that redirects any page in any subdomain (excluding www) to a specific script which prints the GET vars to the page, adding the subdomain name to the query. My attempt:

RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^(www\.)?mysite\.com$ [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.]+)\.mysite\.com$ [NC]
RewriteRule ^$ [mysite.com...] [L]

If I type in a URL like "subdomain.mysite.com", it works, and "subdomain" gets printed to the page. If I add anything to that URL, I get a 404. This is confusing, as I interpret the code to read, "If the URL has a subdomain other than 'www', treat the URL like this other URL, with the subdomain appended to the query." But it's not behaving that way?

I checked with my host today and they confirmed (and some testing proved) that they allow wildcard subdomains.

Some other pitfalls still loom but I'd like to know that I'm at least on the right track.

Thanks again.

jdMorgan

4:41 pm on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> If I add anything to that URL, I get a 404.

I suspect a 'greediness/precedence' regex problem in the second RewriteCond... Not sure yet exactly how to address that, but can you post some exhaustive examples of (sub)domains that work and those that don't?

I'm looking for something including the specific results, like this:


works -- example.com - no rewrite, as expected
works -- www.example.com - no rewrite, as expected
works -- <other_sub>.example.com - rewrites to example.com/<other_sub> as expected
breaks - www.<other_sub>.example.com - rewrites to example.com/www/ instead of /<other_sub>!

whatever your actual results are.

Also, when you get a 404, what does the URL-path in the server error log show?

Jim

akimbo_sf

7:56 pm on Mar 23, 2006 (gmt 0)

10+ Year Member



works -- mydomain.com - no rewrite, as expected
works -- www.mydomain.com - no rewrite, as expected
works -- blah.mydomain.com - rewrites to www.mydomain.com/modrewrite_test.php?subd=blah
works -- www.blah.mydomain.com - rewrites to www.mydomain.com/modrewrite_test.php?subd=blah
breaks -- blah.mydomain.com/anypage.php - searches for www.mydomain.com/anypage.php, generates 404

Checking the error log, each 404 generates 2 lines:

[Thu Mar 23 14:46:13 2006] [error] [client <my IP>] File does not exist: /home/myhostacct/public_html/404.shtml
[Thu Mar 23 14:46:13 2006] [error] [client <my IP>] File does not exist: /home/myhostacct/public_html/anypage.php

(substituted <my IP> for my actual IP, and "myhostacct" for my actual account name with my host)

jdMorgan

8:13 pm on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's because your code specifically rewrites only requests for "/" and takes no action if a page is specified.

If a specific page is requested, what do you want to do with it? You can:

A) Rewrite the page name into a variable and pass it to mod_rewrite_test.php, just like the subdomain name.
B) Drop the page name (the script can still retrieve it if needed).
C) Take action on certain page requests, but not others.

Also, I doubt you really want to use an external redirect, but since it makes testing easier, let's leave that alone for now.

Jim

akimbo_sf

9:10 pm on Mar 23, 2006 (gmt 0)

10+ Year Member



Thanks for your persistence. All I need mod rewrite to do, just as you said in (A), is to stick the subdomain name into a var in the query string, to be used by every page to determine what content to load.

ex: bio.php is a page on the site that displays, say, a user's bio. Thus, john.mydomain.com/bio.php will know to load John's bio only.

I'll also have at least one other rewrite rule in there, which manages real estate listings (this one is currently working). It allows this cleaned-up URL:

www.mydomain.com/listings/1234

to pull from here:

www.mydomain.com/listings/detail.php?listid=1234

and I want to make sure the subdomain rewrite rule, or resulting appended var, doesn't somehow interfere with this.

Definitely don't want external redirects.

Thanks again.

jdMorgan

10:22 pm on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, this takes *any* requested URL, and tacks on the subdomain parameter.
Note that it will do this for *all* URLs, including /robots.txt, logo.gif, validate.js, etc... So you'll probably want to add restrictions to that. using RewriteConds to create exceptions.

RewriteEngine on
Options +FollowSymlinks
# RewriteCond %{HTTP_HOST} . ### Not needed, third RewriteCond already handles this ###
RewriteCond %{HTTP_HOST} !^(www\.)?mysite\.com$ [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([^.]+)\.mysite\.com$ [NC]
RewriteRule ^(.*)$ http://www.mysite.com/$1?subd=%2 [L]

BTW, once you get all this working, change the rule to this form:

RewriteRule ^(.*)$ /$1?subd=%2 [L]

In other words, drop the method and the domain name from the substitution URL. Using that syntax, you'll get an internal rewrite rather than an external redirect. However, as noted, it's easier to debug using a redirect, because you can watch it happen in your browser address bar.

Jim

akimbo_sf

11:03 pm on Mar 23, 2006 (gmt 0)

10+ Year Member



Thanks for the code Jim. Unfortunately it's throwing a 500 Internal Server Error (that is, any attempt to access a non-www subdomain). In previous attempts (and likely this one too), this was triggered by the presence of (.*) on the left side of the RewriteRule (see my first post for this).

I changed it to ([^z]+) and called a URL with no 'z' in it, to see if the ultra-greedy (.*) was causing the problem. Same thing, 500 error. Any ideas?

I'm unclear on why the absolute URL as the replacement string in the RewriteRule forces a redirect...thought there was a special flag for explicit redirects.

Also, just to be sure -- would your code work if the URL being transformed already contained a query string? Do I need to add the QSA flag for this?

Thanks.

jdMorgan

11:56 pm on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I'm unclear on why the absolute URL as the replacement string in the RewriteRule forces a redirect...thought there was a special flag for explicit redirects.

See Apache mod_rewrite documentation [httpd.apache.org]. This behaviour is explicitly described.

When you get a 500 error, what does the server error log show?

Jim

akimbo_sf

12:02 am on Mar 24, 2006 (gmt 0)

10+ Year Member



[Thu Mar 23 15:50:06 2006] [error] [client <my IP>] File does not exist: /home/myhostacct/public_html/500.shtml

I suppose it's good news for troubleshooting that none of the 500 errors I got with my code appear in the error log, but the one I got using your code does.

akimbo_sf

12:28 am on Mar 24, 2006 (gmt 0)

10+ Year Member



Minor update: I see a similar problem here:

[webmasterworld.com...]

but his solution doesn't seem clear to me.

jdMorgan

12:33 am on Mar 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You apparently have ErrorDocument directives that are pointing to non-existent custom error files. I'd advise you to comment them out or delete them. Look for lines like:

ErrorDocument 500 /500.shtml

and disable any that do not point to valid files in your Web root directory.

This will at least give you valid error reports, rather than showing a secondary error.

It's beginning to look like a bad regex library has been released, and is appearing on more and more servers. I'm not sure it's worthwhile trying to debug code if a basic pattern like (.*) won't work...

Jim

akimbo_sf

12:56 am on Mar 24, 2006 (gmt 0)

10+ Year Member



A breakthrough here, sort of. Removing the leading slash from the substituted expression in the RewriteRule did the trick:

RewriteRule ^(.*)$ $1?subd=%2 [L]

This leaves me a little puzzled about correct syntax...but hey, can't argue with success. (any insights, though?)

Per your msg #8 in this thread -- sounds like appending a var to *all* URLs, specifically requested or includes, might be bad. So I would limit this to *.php files, or implied index pages:

RewriteCond %{REQUEST_URI} \.php [OR]
RewriteCond %{REQUEST_URI} /$

Look ok?

Also -- was hoping for clarification on the QSA flag, and why I don't need it here.

Thanks very much.

jdMorgan

1:08 am on Mar 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actaully, you can add that exclusion in the rule itself:

RewriteRule ^([^.]+\.php)?$ http://www.example.com/$1?subd=%2 [L]

That pattern requires <something>.php or -blank- as the local URL-path.

Jim

akimbo_sf

1:38 am on Mar 24, 2006 (gmt 0)

10+ Year Member



Hmm, okay...some questions:

1) It looks like your regex would pass with a blank local URL path, but I've noticed that REQUEST_URI isn't blank, but rather a single "/", when the index page is implied. Is that true?

2) Does your regex take into account the possibility of a query string in the requested URL?

3) Would any of this interfere with the existing RewriteRule that allows for clean URLs on the real estate listing page?:
RewriteRule ^listings/([0-9]+)$ /listings/detail.php?lid=$1

4) If I were to ignore the restriction we're talking about (apply transformation only to PHP files), does that mean that every file embedded in a page (like a GIF or JS file) would have a var appended to it? Or does this only affect URLs specifically called (e.g. in the address bar) by the user?

Thanks again.

jdMorgan

4:05 am on Mar 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1) Your observation is correct: REQUEST_URI and the local URL-path 'seen' by RewriteRule in a .htaccess context are not the same variable.

2) Query srings are not part of a URL, they are data attached to a URL, to be passed to the resource at that URL. Therefore, RewriteRule does not see query strings when pattern matching, and does not change them, unless exlicitly told to do so by adding "?<anything> to the substitution URL.

3) I don't know -- I'm not familiar with your definition of 'clean' URLs. However, if you want that 'clean URL' rule to run after the subdomain rule has been invoked, then you'll need to remove the [L] flag from the subdomain rule if it precedes the 'clean URL' rule. As described in the documentation, [L] means 'last rule' -- quit if this rule matches and is invoked.

4) Every page, image, external JavaScript, and CSS file is requested separately by the browser, creating a separate HTTP request to your server, invoking the httpd.conf file, activating Apache modules and .htaccess -- the works. The server runs largely the same process for every access that you see in your raw server access logs - pages, images, etc. Unless you exclude these non-php files, then they'll get the subdomain var attached to them, too. It's likely they would still 'work' but it's an unnecessary waste of resources, and there may be side effects on your site that I can't foresee.

Jim

akimbo_sf

12:49 am on Mar 25, 2006 (gmt 0)

10+ Year Member



Per your answer to (2): would the QSA flag re-append the original query string to the replacement URL? I'm still not clear on the point of that flag.

And per (3): sorry for the imprecise language. By 'clean URL,' I just meant the nice-looking domain.com/listings/1234 versus the comparatively ugly domain.com/listings/detail.php?lid=1234. Your advice about removing the [L] flag is noted.

Jim, you've been a huge help. My project is rolling again. I owe you a beer.

jdMorgan

1:54 am on Mar 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Per your answer to (2): would the QSA flag re-append the original query string to the replacement URL? I'm still not clear on the point of that flag.

If (and only if) you wish to use a RewriteRule to add additional name/values to a pre-existing query string, you must use the [QSA] (Query String Append) flag.

If you do not specify a "?" character in your substitution URL, then any existing query will pass through rewriterules unchanged. If you add a "?" optionally followed by new query data, then any pre-existing query data will be replaced. So, if you wish to add query data while retaining the previous query data, use [QSA].

Rates are up -- Need a six-pack of the imported stuff, or a case of bud... ;)

Jim