Forum Moderators: phranque

Message Too Old, No Replies

Remove character from a host string

         

thetitan

2:33 pm on Jun 5, 2009 (gmt 0)

10+ Year Member



Hello all,

I have a main domain and a much of aliases. I want to remove the www. for each of them if anyone ever visits with www. prepended to anyone of the domains.

1. I know how to remove the www. for a single domain, but in this case I have a bunch of domains.
2. I would like to use wildcard instead of writing rewrite rules for each domain. The goal here is to stay small and fast.
3. I did this in php, but then I realized that the appended query string get's removed. I can throw the query sting in a variable and after removing the www. append it to the domain.
4. I would like to not have to call a function in the beginning of every page to check for the above. Instead just do it by default.
5. Most of all I would like to know if this can be done via mod_rewrite. I'm sure it will help others in the future.

This is what I have


Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.* [NC]
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1 [L,R=301]

The question is how can I tell Apache to first remove www. from the %{HTTP_HOST} string?

Thanks for your time.

g1smd

2:45 pm on Jun 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The code for adding the www on (rather than taking it off) is far easier:

Options +FollowSymLinks 
RewriteEngine On
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule (.*) http://www.%{HTTP_HOST}/$1 [R=301,L]

but even this code doesn't fix up www requests with appended port number, etc.

[edited by: g1smd at 2:50 pm (utc) on June 5, 2009]

g1smd

2:49 pm on Jun 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The code for taking it off would be something like:

Options +FollowSymLinks 
RewriteEngine On
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^www\.([^:]+)(:[0-9]+)? [NC]
RewriteRule (.*) http://%1/$1 [R=301,L]

This code does not fix up any requests with appended period after hostname.

This has been covered here before, and I am sure that other threads contain better and far more robust code than this.

thetitan

2:53 pm on Jun 5, 2009 (gmt 0)

10+ Year Member



what exactly does this do:

\.([^:]+)(:[0-9]+)?

g1smd

3:09 pm on Jun 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It matches:
.example.com
.example.com:80
.example.co.uk
.example.co.uk:80

thetitan

4:01 pm on Jun 5, 2009 (gmt 0)

10+ Year Member



Thank you.

Could you tell me why the following returns a page that says "moved permanently" with a link to [/query........]


Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www\.* [NC]
RewriteRule (.*) http://%1/$1 [R=301,L]

But this does actually strips www. from the domain and does the redirect


Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.([^:]+)(:[0-9]+)? [NC]
RewriteRule (.*) http://%1/$1 [R=301,L]

g1smd

4:23 pm on Jun 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With the first one, nothing is populated into %1 for use in the redirect.

Additionally, look at ^www\.* That is not .* as in one or more characters, it is a literal period, followed by * meaning zero or more periods.

That pattern would try to match a HOST_NAME like

www........
with multiple periods in it (except that such a request would be unlikely to make it through DNS to your server).

It is likely you meant

^www\..*
but there's never a need to add
.*
to the end of an unanchored pattern.

thetitan

4:47 pm on Jun 5, 2009 (gmt 0)

10+ Year Member



How does %1 or %2 or %3 get populated?

Does Apache read this: ^www\.([^:]+)(:[0-9]+)? like this: look for wwww with a ., continue looking for more text until (if applicable) : or : with a number.

How does %1 ends up being the domain, extension, and when applicable, the port without the www.?

g1smd

5:16 pm on Jun 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



$1 is picked up from the pattern on the left of the Rule, and can be used in a RewriteCond above the Rule, or in the target on the right of the Rule.

%1 is picked up from the RewriteCond above the line where the %1 is going to be re-used (whether you are re-using it in another RewriteCond, or in the final Rule).

thetitan

5:33 pm on Jun 5, 2009 (gmt 0)

10+ Year Member



I've used $1, $2 ... before in regex, and I like them. But %1, %2 ... is puzzling me.

1. how does %1 end up being the domain, but without www.?

2. I've seen others use %2, %3. does that indicate how many condition lines to go back? For example, in the code you provided earlier, %2 would be the ., seeing that it was going two lines back, and if we try to use %3 there won't be anything there, because there is no 3rd condition line going back? By the way, are %1, %2 ... only limited to condition lines, or can they be used to get data from any line?

g1smd

9:41 pm on Jun 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can only re-use %1 on the line below where you picked it up, so you'll pick it up in a Condition and then use it in the next line (whatever that is, condition or rule).

The numbering is worked out by counting brackets from the left. so (xx((yy)aa))(zz) would see this 1 = xxyyaa, and 2 = yyaa, and 3 = yy, and 4 = zz for example.

Options +FollowSymLinks 
RewriteEngine On
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^www\.[b]([^:]+)[/b](:[0-9]+)? [NC]
RewriteRule (.*) http://[b]%1[/b]/$1 [R=301,L]

One more thing, processing order:

RewriteCond {object} Pattern=[2]
RewriteCond (object} Pattern=[3]
RewriteRule Pattern=[1] Target=[4] Flags=[5]

jdMorgan

11:39 pm on Jun 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The "check for blank hostname" rewritecond isn't needed here, since the 'main' rewritecond won't match blank hostnames either. Also, don't forget that there can be a trailing period on an FQDN, as well as an appended port number.

I'd suggest:


Options +FollowSymLinks
RewriteEngine On
#
RewriteCond %{HTTP_HOST} ^www\.(([^.]+\.)+[^.:]+) [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

Jim

thetitan

3:52 am on Jun 6, 2009 (gmt 0)

10+ Year Member



Thank you for the solutions, the tutorials/explanations and your time.

g1smd

8:55 am on Jun 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's clever:

^www\.(([^.]+\.)+[^.:]+)

I couldn't see an easy way to exclude any trailing punctuation, without stripping off the 'uk' part on, say, a .co.uk hostname.