Forum Moderators: phranque

Message Too Old, No Replies

301 rewite for all improper server prefixes

%20www. redirect

         

soapystar

11:19 pm on Sep 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



already have this
RewriteEngine On
RewriteCond %{HTTP_HOST} ^mydomain.com
RewriteRule (.*) [mydomain.com...] [R=301,L]

but now i find google has indexed an entire copy of the site because someone linked with an error in their code to give the url of %20www.mydomain.com

Any ideas how to change the htaccess to 301 any url that isnt www to the correct www?

jdMorgan

12:48 am on Sep 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sure, look for a negative match, as long as the HTTP_HOST header is non-blank:

RewriteEngine on
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com
RewriteRule (.*) http://www.mydomain.com/$1 [R=301,L]

soapystar

9:25 am on Sep 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hum..doesnt seem to work..sure works ok for the non-www but %20www doesnt get rewritten.it still gets served...even typing in ww is served as a 404 rather than being rewritten....

jdMorgan

2:59 pm on Sep 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Flush your browser cache before testing any new access-control code.

If it still doesn't work, then investigate whether you (or your host) have any other code in httpd.conf or .htaccess files that will interfere with it. The code will work fine as long as:

A) Requests for the malformed/incorrect domain are resolved to the directory in which this .htaccess file resides.
(That is, wild-card DNS and server name configurations must be correct)

B) The .htaccess file is processed mod_rewrite code is invoked for those domain requests.

C) No other code is invoked which interferes with this code.

Jim

soapystar

9:22 pm on Sep 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



doesnt matter what i do...either it doenst work at all or i get an error for all urls including www...at the very least i need to redirect %20www to www or at least have it throw up a 404...but i cant even get to that stage..damn!

jdMorgan

9:36 pm on Sep 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It sounds like your server is not set up with true "wild-card" domain mapping. You might want to ask you host about this problem.

In order for the code to have any effect, the server must deliver the request to the directory where the code resides. Since it works for non-www, that indicates that you've probably installed it correctly and that it is being invoked for the non-www requests. But it sounds like it's not being invoked for malformed %20www requests.

Also, when posting here that 'it doesn't work' it's often helpful to post details about the error: What happened? How did that differ from what you expected? What did you see in the browser? What did you see the the server access log and the server error log? If you got a 404, what was the URL in the address bar? Posting details like these can help others help you, and can be a good investment of your time.

Jim

soapystar

10:04 pm on Sep 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



yup, sorry.
Re: host..no help at all they just keep referring me to the [httpd.apache.org...] doc and told me the way that apache passes urls is http:// www.domain.com

When i use the code
RewriteEngine on
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST}!^www\.mydomain\.com
RewriteRule (.*) [mydomain.com...] [R=301,L]

domain.com redirects to www.domain.com just fine but anyhting else in the broswer i.e ww or www1 or %20www returns the full server prefix and does not rewrite to www. If i try
RewriteEngine On
RewriteCond %{HTTP_HOST} ^%20www.domain.com
RewriteRule (.*) [domain.com...] [R=301,L]

the code has no affect at all. If i try
RewriteEngine On
RewriteCond %{HTTP_HOST} ^ www.domain.com
RewriteRule (.*) [domain.com...] [R=301,L]

with the gap instead of %20 it generates a 500 error for all urls of all prefixes. It doesnt matter if i 404 or 301 %20www but cant get it to do either. The hosts own website also displays the %20www if you enter the url that way instead of of throwing up a 404. I

jdMorgan

10:29 pm on Sep 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, you'd have to escape the space to avoid a fatal syntax error in mod_rewrite:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^\ www.domain.com
RewriteRule (.*) http://www.domain.com/$1 [R=301,L]

But from your descriptions, the server is not passing control to your code, so it won't make any difference whether you use this, or the original negative-match code I posted.

So, it's back to your hosting provider for help with the httpd.conf setup, or on to a different host. Tell them you want a ServerAlias of


ServerAlias *.domain.com

That should pass all requests (for which you have DNS resolving to this server's IP address) to the root directory of your site. You can then use your mod_rewrite code to sort out what you want to do with each domain variant.

Jim

Robert Charlton

7:09 am on Sep 28, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Jim - Naive question here... When I've used your code to correct any mistyped subdomains and redirect all variants to www, I've used it in conjunction with enabling wildcard subdomains in the DNS server.

How might wildcard DNS apply, or not apply, to the situation soapystar is posting about?

Also, is ServerAlias *.domain.com

a) a substitute for wildcard DNS?... or

b) a module that must be enabled for your code to work to rewrite wildcard DNS?

c) something else entirely?

jdMorgan

2:27 pm on Sep 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, a very good point.

However, if wild-card DNS was not defined, I'd expect to see "Server could not be found" or "DNS lookup error" messages reported by the browser.

Since pages from the correct site are being served, I assumed that the DNS end of things was all taken care of.

Jim

Robert Charlton

6:18 pm on Sep 28, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Jim - To clarify, since you know how little I know about this stuff... does this mean that when I use your mod_rewrite code to redirect all variants to www, that I also need to make sure that ServerAlias is set up as you're suggesting? ie,...

ServerAlias *.domain.com

Is this generally a default on most servers?

I ask because I've never known about ServerAlias before (there's much I don't know about with Apache servers ;) ) and want to find out what bases need to be covered when I set up redirect to www.

So far, in my ignorance, I haven't run into any problems redirecting wildcard dns, but I'm wondering if I've just lucked out.

Robert Charlton

6:41 pm on Sep 28, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



As a PS to the above, I thought I'd see if I can reproduce soapystar's problem with domains I've set up this way, and unfortunately I can.

A space between // and www does give me a %20www. I've tried checking it on Brett's server header checker tool, and it won't even deliver a response.

jdMorgan

7:42 pm on Sep 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ServerAlias *.domain.com

Is this generally a default on most servers?

I'd guess that the default on commercial name-based virtual hosting is

ServerAlias www.domain.com

That is, defining only the www subdomain as an alias for the main domain.

I'd like to see more experimental results like the server headers check results. Technically, a leading space or encoded space violates the HTTP/1.x protocol, and I was surprised that it worked at all.

Jim

soapystar

7:57 pm on Sep 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



i still havent found a work around. The host has tried setting both:

ServerAlias www.domain.com
&
ServerAlias *.domain.com

both configurations give a 404 on non-www but still allow %20www to display in the browser. This is with all the codes suggested in htaccess.

Robert Charlton

6:07 am on Sep 29, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I've tried the space before www on one site I'm just now assessing... haven't started to work on... and from prior checking I know that it doesn't have wildcard DNS, and that it doesn't have redirection to www. It displays either with or without www, as entered in the address bar, and if I just type in, say, w or ww instead of www, I get a 404.

With space before www (after http), my browser (IE6) displays %20www in the address window, and I get a 404 instead of the page. Again, this site doesn't have the rewrite. With sites with the rewrite, when I see the %20www, I don't get the 404.

So, the problem seems to be "enabled" because of the wildcard DNS and the rewrite to www. This may already be obvious to Jim and soapystar, but I wasn't sure what would happen, so I tried it and am reporting.

jdMorgan

12:49 pm on Sep 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Reviewing the code posted above, I realized that this variant, with a properly-escaped "%" character, possibly hasn't been tried yet:

RewriteEngine On
RewriteCond %{HTTP_HOST} ^\%20www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

I don't hold out much hope for it, though, because the negative-match version I posted at the start should have worked if this variant works.

I tried the spaced-www test myself on a site *without* wild-card DNS, and got a DNS lookup failure.

Jim

[edit] Examplified domain [/edit]

[edited by: jdMorgan at 4:09 pm (utc) on Sep. 29, 2005]

soapystar

4:07 pm on Sep 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



nope, no luck with that either..

sigh....!

soapystar

10:06 pm on Oct 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



can this:

RewriteCond %{HTTP_REFERER} ^(.*)domain.com(.*)$
RewriteRule (.*) [domain.com...] [R=301,L]

have one line added to say EXCEPT www.domain.com?
I think ^(.*)domain.com(.*)$ works as a redirect for the %20www but then you get into a loop with it redirecting to the www which is then redirected to itself giving the EXCEEDED MAXIMUM REDIRECTS error.

Thanks.

jdMorgan

10:10 pm on Oct 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why are you using HTTP_REFERER instead of HTTP_HOST?

Jim

soapystar

9:17 pm on Oct 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



in the end the fix was to remove the wildcard from the dns.....

oh well!

Robert Charlton

7:12 am on Oct 18, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



in the end the fix was to remove the wildcard from the dns.....

soapystar - This is pretty much the decision I'd come to as well. I tested a bunch of existing domains I'm involved with that had rewrites to www, and the only ones that gave me problems with the space were those with wildcard DNS.

What I haven't tried yet is to create an extra A-record for w.domain.com and for ww.domain.com, as well as www and domain.com. These should all be rewritten to www with the existing code, and would essentially do much the same thing as the wildcard DNS. If you try it before I do, please post a note about it.

PS: To correct one misstatement in my post above about this, with the space in the urls without the wildcard DNS, I was getting a "server not found" message, not a 404.