Forum Moderators: phranque

Message Too Old, No Replies

redirect www.example.com:443 to www.example.com

3 days later, every thred read and ideas tried

         

helpnow

2:54 pm on Aug 29, 2009 (gmt 0)

10+ Year Member



I guess I need help. I have searched these forums for the trick, and tried the endless supply of code ideas I have found here. My problem still exists. I would be indebted if someone can point me in the right direction. Here is my scenario:

Somehow, when I do site:example.com for my domain, the first entry is the deadly "400 Bad Request", for the URL www.example.com:443.

OK, so I fixed this in ErrorDocument 400, by sending it to my homepage. Doesn't answer the question of how the :443 URL got indexed, but I suspect it doesn't matter - it needed to be fixed.

The problem is, the URL remains www.example.com:443, even though it is at least now going to my proper homepage. I have to fix this because it will cause canonical issues.

That's the problem.

I am trying to fix it in httpd.conf. (Yes, I do have an active ssl.conf, I run both http and https for this domain). Since this is a http request, I assume I need to fix this in httpd.conf.

In httpd.conf, I assume I need to test server_port for 443, and then simply rewrite the URL? Here are a bunch of code chunks I tried, none worked - it always keep the URL www.example.com:443. All of these code ideas came from these forums.

rewritecond %{server_port} ^443$
RewriteRule (.*) http://www.example.com/ [R=301]

rewritecond %{server_port} ^443$
rewriterule ^/(.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{SERVER_PORT} ^443$
RewriteRule (.*) http://www.example.com [R=301]

RewriteCond %{REQUEST_URI} (.*):443/(.*) [NC]
RewriteRule (.*) http://www.example.com [R=301]

Redirect permanent http://www.example.com:443/ http://www.example.com

RewriteRule (.*):443/(.*) $1$2 [R=301]

I am at wits end, I have pulled out all my hair, and my head is bruised from banging it against the wall. I am writing here for help only after hours and hours of reading and trying everything I can think of. Please put me out of my misery! I am sure the answer is simple, I am sure it is right there in front of me, but after all these tries, I guess I have become blinded, and this area is not my specialty at all.

Thank you SO MUCH in advance to anyone who tries to help me!

jdMorgan

4:28 pm on Aug 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can fix such problems at the server config file level (e.g. httpd.conf), or in a per-directory context (i.e. .htaccess). Both affect (only) HTTP and HTTPS requests to your server (i.e. not FTP and not internal requests). The choice of server config versus .htaccess depends on your preferences.

Server config (e.g. httpd.conf)

  • Code is 'compiled' at server restart, and so is much more efficient on a per-request basis
  • Server must be restarted to re-compile, so that code changes can take effect
Per-directory (.htaccess)
  • Code is 'compiled' for each HTTP(S) request, so less efficient
  • Server need not be restarted for changes to take effect
  • Sometimes the only choice, as most Webmasters on shared hosting don't have access to server config files

It *is* important that you find the source of this bad link, even after applying a 'band-aid' to your server.

It is also important to find and fix the reason you get a 400-Bad Request for SSL requests, rather than serving your home page with a 200-OK response; You will indeed get many bad requests -- most of them malicious, and it's important to handle them appropriately. IOW, do not leave 'kludgey work-arounds' on your server if your search ranking matters to you -- Go by the book (the HTTP protocol specifications, in this case) with regard to server response codes and error handling.

---

One question not answered above, and not obvious from the small code snippets you posted:

Do you have *any* working rewriterules in this httpd.conf file?

If not, then there's a good possibility that you have not enabled mod_rewrite processing and/or enabled the rewriting engine within mod_rewrite. To enable mod_rewrite processing, either the FollowSymLinks or the SymLinksIfOwnerMatch Option must be set in the server context in question.

In addition, "RewriteEngine on" must be specified prior to the mod_rewrite code you want to process. Generally, this is only done once at the top of the mod_rewrite code, but you do have the option to enable and disable sections of mod_rewrite code by using "RewriteEngine on" and "RewriteEngine off" ahead of them.

For best results, I suggest that you observe the *exact* syntax specified in the mod_rewrite documentation, and *do not* feel free to play fast and loose with spacing or capitalization, etc.

---

In short, it may be necessary for you to include some 'set up' directives ahead of your code, in order to enable it.


Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{SERVER_PORT} =443
RewriteRule ^ http://www.example.com/ [R=301,L]

Note that this will redirect *any and all* SSL requests to the non-SSL home page. Therefore, your site will become completely inaccessible using SSL/HTTPS. Be sure that that is what you want before leaving this code in place.

Because (for example) that rule will redirect requests for [example.com...] and other non-HTML resource requests to your non-SSL home page, you may instead wish to make the URL-path tested by the RewriteRule more specific and/or you may wish to redirect each SSL page to its non-SSL counterpart instead of redirecting all to your home page -- for example, changing the rule to:


RewriteCond %{SERVER_PORT} =443
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

If you haven't already done so, spend an hour reading the Apache mod_rewrite documentation from start to finish, seeking at a minimum to get an 'overview' of it, if not full comprehension. The requirements for the Option setting and "RewriteEngine on" directive mentioned above are documented, and you'll likely admit you could have saved some time. This may also prevent you from making a 'deadly' mistake and nuking your whole site in the search engine results. All it might take is one single typo...

Jim

helpnow

4:56 pm on Aug 29, 2009 (gmt 0)

10+ Year Member



Thank you very much, Jim for you quick and lengthy reply! THANK YOU! Give me a minute to go through all you said, but I do want to quickly reply that yes, I have a lot of working RewriteConds and RewriteRules in my httpd.conf file, so I am set up to get them working... Back in a minute...

helpnow

5:14 pm on Aug 29, 2009 (gmt 0)

10+ Year Member



Responses:

1. when you say "server must be restared" after changing httpd.conf, I assume you mean simply that I must invoke a service httpd restart, and not that the whole server must be shut down and started again...

2. I did try to find the bad link, by going through the logs, but I cannot find it. Will it be in the ssl or nonssl logs? I am still a bit confused - is this a http request, or a https request? Which log would it be in? Perhaps I don't know how to search - I searched for "400" in the logs, fot the error message 400, but couldn't find anything except references to robots.txt, which didn't make sense... Or does it? If not, how else would I find the cause of the 400 error?

3. Yes, in my httpd.conf, in "<VirtualHost *:80>", I do have "RewriteEngine On", and other rewriterules do work. And yes, Options -Indexes FollowSymLinks is on.

4. I do need to preserve access to https for certain secue sectioins of my site. So, my desire is to just rewrite the example.come:443 into example.com - I don't want to stp any of my acceptable https requests. So I don't think I can do this: RewriteRule ^ http://www.example.com/ [R=301,L]

5. Every change I make to httpd.conf in this vein, I test and then revert if the desired fix didn't take hold. So, I haven't gotten to the point where I got a fix, but perhaps in so doing screwed up my site everywhere ele. As it stands, my httpd.conf is still in in its original; working form. I am concerned about making a "deadly" mistake as you outlined, and know I have a lot of testing to do once I fix the 443 redirect to make sure everything else is untouched.

6. I have over the years done a lot of redirects, and have read the mod_rewrite doc... I am not a pro at all, but I have been successful in gettign these to work in he past. It is just in this case, I have some oddball situation I don't quite understand with the :443 present in the URL. My httpd.conf file is full of rewrites to deal with different robots and bad urls that have been indexed in the past. I simply never saw a :443 before and everythign that has worked for me in the past with different situations for rewritting URLs won't work for this one.

I'm going to give this a shot now, and then report back:

RewriteCond %{SERVER_PORT} =443
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

Thank you so much for your help, Jim!

C

helpnow

5:21 pm on Aug 29, 2009 (gmt 0)

10+ Year Member



RewriteCond %{SERVER_PORT} =443
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

Changes nothing - http://www.example.com:443/ still goes to my home page, http://www.example.com, but the URL remains the same: http://www.example.com:443/

To donfirm, after I made the change, I invoked a service httpd restart, I did not restart the whole server.

+++++++++++++++

In my httpd.conf, I have this:

<VirtualHost *:80>
ServerAdmin webmaster@example.com
ServerName www.example.com
DocumentRoot /home/websites/example
ErrorLog logs/examplenossl-error
CustomLog logs/examplenossl-access common
ServerAlias example.com www.example.com
ScriptAlias /cgi-bin/ "/home/cgi-bin/example/"
RewriteEngine On
# no useful info really RewriteLog logs/rewrite_log
# no useful info really RewriteLogLevel 1
<Directory "/home/websites/example">
AllowOverride All
Options -Indexes FollowSymLinks
order allow,deny
Allow from all
Options +ExecCGI
</Directory>

Redirect permanent /index.htm http://www.example.com/cgi-bin/example
Redirect permanent /index.html http://www.example.com/cgi-bin/example

RewriteCond %{SERVER_PORT} =443
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

C

jdMorgan

5:44 pm on Aug 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the ":443" is appended to the requested hostname, and you want to redirect to remove that regardless of the HTTP/HTTPS protocol being used in the request, then

RewriteCond %{HTTP_HOST} :443$
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

would do that.

Better yet, you should canonicalize *all* variant requests, using something like:


RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{SERVER_PORT}>s ^(443>(s)¦[0-9]+>s)$
RewriteRule ^/(.*)$ http%2://www.example.com/$1 [R=301,L]

This 301-redirects any request using any non-canonical variation on the hostname back to the canonical domain, while preserving the protocol (HTTP or HTTPS) of the original request. It fixes FQDN-formatted hostnames, hostnames with port numbers appended, non-www hostname requests, and capitalization variations. For example, it can fix each of the problems with a request for

eXamPle.Com.:8080

Important: Replace the broken pipe "¦" character with a solid pipe character before use; Posting on this forum modifies the pipe characters.

Jim

helpnow

5:58 pm on Aug 29, 2009 (gmt 0)

10+ Year Member



Change httpd.conf, and then do a 'service httpd restart', right?

++++++++++++

RewriteCond %{HTTP_HOST} :443$
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

-> doesn't work! The ":443" is still appended to the URL!?

+++++++++++++++++++++

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{SERVER_PORT}>s ^(443>(s)¦[0-9]+>s)$
RewriteRule ^/(.*)$ http%2://www.example.com/$1 [R=301,L]

This is clearly the change I want, but yet, when I add it to httpd.conf, restart httpd, no affect. The :443 is still in the URL.

++++++++++++++++

I am missing something... Does it have anything to do with the fact that this is a 400 error already previously changed in the httpd.conf with this?:

ErrorDocument 400 //cgi-bin/example

(which effectively goes to my home page...)

It feels like the :443 URL just isn't reaching my rewrites...!?

jdMorgan

6:56 pm on Aug 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are you completely flushing (deleting) your browser cache after changing the server-side code? If not, your browser will simply show you previously-cached pages and server responses (and you will see no new request logged on your server).

If it's not that, then please show us the request line from your raw server access log, and any other logged info you may have.

A service restart on Apache should be entirely sufficient as a 'restart'.

[added]
Your ErrorDocument directive should read:

 ErrorDocument 400 [b]/cg[/b]i-bin/example 

(only one slash to specify 'path from DocumentRoot')
[/added]

Jim

helpnow

9:32 pm on Aug 29, 2009 (gmt 0)

10+ Year Member



No cache - I emptied it and retried - I always do a <CTRL><F5> anyway. Same thing, no fix.

Checking my logs now...

Thank you for being there for me, Jim! It would be great to resolve this before the day is through - it has been a major bee in my bonnet for days. And yes, my ranking is affected too, so I have the weight of that hanging over me as well... Not great.

helpnow

1:00 am on Aug 30, 2009 (gmt 0)

10+ Year Member



Jim,

OK, after extensive testing, this is what I get from my logs...

ssl_access.log (my IP replaced with 9s)
999.999.999.999 - - [29/Aug/2009:20:24:23 -0400] "GET /" 400 67281

ssl_request.log
[29/Aug/2009:20:24:23 -0400] 999.999.999.999 - - "GET /" 67281

ssl_error
<no entry>

Does this shine any light on my situation?

Does this man I need to be focussing on the ssl.conf file instead of httpd.conf?

helpnow

1:08 am on Aug 30, 2009 (gmt 0)

10+ Year Member



P.S. I did both of the above code snippets in my ssl.conf, and no luck.

RewriteCond %{HTTP_HOST} :443$
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

+++++++++++++++++++++

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{SERVER_PORT}>s ^(443>(s)¦[0-9]+>s)$
RewriteRule ^/(.*)$ http%2://www.example.com/$1 [R=301,L]

helpnow

2:09 am on Aug 30, 2009 (gmt 0)

10+ Year Member



Jim, am I supposed to comment out the ErrorDocument 400? Is the whole idea that we rewrite the URL with the 443 so that it is fine and doesn't need the 400 error document message? Is it possible that it is hitting a detour with the 400 error message, and not reaching my rewrites on the URL? i.e. were the RewriteCond/ReWriteRule combos above supposed to be implemented with the ErrorDocument 400 not turned on? Or does it matter?

I continue to bash my head against the wall here, I am no further along. I have been testing and experimenting for hours now... I've made zero progress. <grin> I am in the same place I was when I started. ; )

C

jdMorgan

2:39 am on Aug 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Because the code snippets above are both 'sensitive' to the hostname and to the protocol, there is no harm in testing both rules in both contexts (SSL and non-SSL). That is, they will only be invoked when they need to be invoked, so I'd say try the code in both files, because otherwise I'm stumped.

The only other thing I can think of is that perhaps the rules *are* being applied, but some other agent in the server (e.g. a script) is invoking a subsequent external redirect and tacking the port number back onto the hostname, causing it to re-appear in your browser's address bar. This could be detected by examining the browser/server transactions using the "Live HTTP Headers" add-on for Firefox/Mozilla browsers.

BTW, Control-F5 is NOT always the same thing as flushing the cache, as it forces a reload, but will in some cases accept a 403-Not Modified server response and use the browser cache anyway. That's why I use the phrase "completely flush (delete)" fairly consistently in this forum. For long/intensive testing sessions, it's often handy to disable the cache by setting its size or its 'storage time' to zero, depending on the browser's available settings. Of course, if you don't remember to re-enable it, you may be miserable when you return to regular Web-surfing, as it will likely slow down your browsing noticeably.

Jim

helpnow

2:46 am on Aug 30, 2009 (gmt 0)

10+ Year Member



Hi Jim!

OK, I am going to get Live HTTP Headers set up for my Firefox.

I will also set the cache to 0 for now. Better I make sure I am really testing my fixes. ; ) I'll deal with slow browsing later...

Meanwhile, I will also carefully try to retry the code snippets 1 at a time, in both httpd.conf and ssl.conf.

Nothing in the log jumped out at you?

helpnow

2:51 am on Aug 30, 2009 (gmt 0)

10+ Year Member



<LOL> OK, I've got Live HTTP Headers loaded, and I am running my 443 page through it. <ahem> What exactly am I looking for!? ; )

helpnow

2:56 am on Aug 30, 2009 (gmt 0)

10+ Year Member



There are 2 main areas, Headers and Generator, in Live Headers.

+++++++++++++
Under Generator, I get:

#request# GET http://www.example.com:443/
GET /
then the actual page, all the images etc. follow.
++++++++++++++++++

Under Headers, well, it goes on and on.

helpnow

3:00 am on Aug 30, 2009 (gmt 0)

10+ Year Member



I've run it twice, and saved the results, once for the version of the URL with the 443, and one without. I will try to see if I can spot a difference between the 2 results...

jdMorgan

3:01 am on Aug 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You're looking at the requests from your browser, specifically at the request line and the Host: header, and seeing whether the server is generating a redirect in response to that "443" being appended to whatever it's getting appended to. And if so, then you're looking to see if, after the browser responds to that server redirect response by re-requesting the resource without the 443 tacked on, the server then comes back with another redirect that adds that 443 back on.

Either a lack of the initial redirect (from the mod_rewrite code) or the presence of a second redirect (adding the 443 back on) is an error. I'm being general here because I'm not completely clear on where this initial 443 is appearing in the first place, as evidenced by the shift in focus from the SERVER_PORT to the HTTP_HOST in the discussion above.

My statement above was meant to say that there's no harm in putting copies of both rules into both files all at the same time, and testing that way.

Note that if the mod_rewrite code is placed within a <Directory> container and that container defines the directory-path as ending with a slash (which it should), then the leading slash should be removed from the RewriteRule patterns, otherwise the pattern will never match. This is also true when the code is used in a .htaccess file.

Jim

helpnow

3:10 am on Aug 30, 2009 (gmt 0)

10+ Year Member



The RewriteCond/RewriteRule combos are OUTSIDE of the <Directory container... They immediately follow the </Directory> line.

However, the path is set like this:
<Directory "/home/websites/example">
no slash at the end. Do I need a slash at the end?

helpnow

3:14 am on Aug 30, 2009 (gmt 0)

10+ Year Member



?

I get very little! It looks like it doesn't hit the redirects at all...

http://www.example.com:443/

GET / HTTP/1.1
Host: www.example.com:443
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 65535
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache

HTTP/1.x 200 OK

helpnow

3:17 am on Aug 30, 2009 (gmt 0)

10+ Year Member



Curiously, on my non-443 version of the URL, I DO hit the redirects, which Ican see from LiveHeaders, because I actually have a permanent 301 from example.com to example.com/cgi-bin/example...

http://www.example.com/

GET / HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 65535
Connection: keep-alive

HTTP/1.x 301 Moved Permanently
Date: Sun, 30 Aug 2009 02:39:11 GMT
Server: Apache/2.0.52 (CentOS)
Location: http://www.example.com/cgi-bin/example
Content-Length: 338
Connection: close
Content-Type: text/html; charset=iso-8859-1

helpnow

3:20 am on Aug 30, 2009 (gmt 0)

10+ Year Member



P.S. In all the LiveHeaders above, this was with the test done only in httpd.conf. I will now try with ssl.conf, as it sure likes the fix isn't being grabbed from httpd.conf.

helpnow

3:34 am on Aug 30, 2009 (gmt 0)

10+ Year Member



Curiously, when I do this in the ssl.conf...

RewriteCond %{SERVER_PORT} =443
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]

...I get the following:

301 Moved Permanently

The document has moved here.

Apache/2.0.52 (CentOS) Server at www.example.com Port 443

----------------

And 'here' is an hyperlink, which when clicked does take me to the right page with the 443 removed.

jdMorgan

2:12 pm on Aug 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the server response is as you describe (301 redirect to correct URL), then it looks like the browser is refusing to follow the redirects. You should check its configuration, and be sure that you have not configured it to ignore redirects. Also, examine the response status line and the redirect-to "link" to make sure that they are utterly correct, as otherwise the browser might refuse to follow this redirect (I don't know, but this situation is quite odd to begin with.)

It is evident from the "Host: www.example.com:443" line in the Live Headers log snippet you posted in post #3980719 above that the specific 'error' is that ":443" is being appended to the Host header, and so the rule you should be using (in ssl.conf) is this one


RewriteCond %{HTTP_HOST} :443$
RewriteRule ^/(.*)$ htt[b]ps:[/b]//www.example.com/$1 [R=301,L]

You may of course wish to also use the other rule, but this rule is the one that applies to the problem at hand.

Some other notes:

Be sure that this rule is the first -- or among the first of the RewriteRules in that file.
Be aware that configuration directives are not necessarily executed in the order that they appear in the config files; Rather, each Apache module in turn scans the config files, and handles the directives that it understands. Therefore, your directives are executed first by module execution order, and then by their order of appearance in your config files. What this means is that if you have other directives that modify the default URL-to-filename translation 'mapping,' then they may be invoked first. Examples of other modules which can affect this are mod_alias, mod_proxy, mod_dir, mod_negotiation, mod_speling, and several of the core directives such as DirectoryIndex.

So, if you have other directives that will affect requests for "/.*" URL-paths, these may be executing before your rewriterule ever has a chance to execute, and making it appear that the rule "doesn't work."

The typical approach to preventing these execution-order problems is to disable any modules your site does not require, and to use mod_rewrite for all rewriting and redirection if you use it for any rewrites or redirects -- i.e. don't mix the modules used to redwrite/redirect requests.

And an "Oh, by the way, but this is important" I'd like to add is that you likely *should not* be redirecting requests to cgi-bin at all. Rather, you should be rewriting these requests. A redirect will 'expose' the /cgi-bin filepath as a URL, and make your site much more vulnerable to malicious attention. Also, there is no reason whatsoever that your site's underlying technology should affect what the user (and search engines) see in their 'address bar' -- adding /cgi-bin to the URL simply clutters up the address bar and makes your search results ugly, hard to read, hard to remember, and hard to type. If all requests need to go to /cgi-bin, then just use an internal rewrite to point them there. (As this is a secondary issue, let's hold off further discussion of this point until the primary problem is solved.)

Jim