Forum Moderators: phranque
Somehow, when I do site:example.com for my domain, the first entry is the deadly "400 Bad Request", for the URL www.example.com:443.
OK, so I fixed this in ErrorDocument 400, by sending it to my homepage. Doesn't answer the question of how the :443 URL got indexed, but I suspect it doesn't matter - it needed to be fixed.
The problem is, the URL remains www.example.com:443, even though it is at least now going to my proper homepage. I have to fix this because it will cause canonical issues.
That's the problem.
I am trying to fix it in httpd.conf. (Yes, I do have an active ssl.conf, I run both http and https for this domain). Since this is a http request, I assume I need to fix this in httpd.conf.
In httpd.conf, I assume I need to test server_port for 443, and then simply rewrite the URL? Here are a bunch of code chunks I tried, none worked - it always keep the URL www.example.com:443. All of these code ideas came from these forums.
rewritecond %{server_port} ^443$
RewriteRule (.*) http://www.example.com/ [R=301]
rewritecond %{server_port} ^443$
rewriterule ^/(.*) http://www.example.com/$1 [R=301,L]
RewriteCond %{SERVER_PORT} ^443$
RewriteRule (.*) http://www.example.com [R=301]
RewriteCond %{REQUEST_URI} (.*):443/(.*) [NC]
RewriteRule (.*) http://www.example.com [R=301]
Redirect permanent http://www.example.com:443/ http://www.example.com
RewriteRule (.*):443/(.*) $1$2 [R=301]
I am at wits end, I have pulled out all my hair, and my head is bruised from banging it against the wall. I am writing here for help only after hours and hours of reading and trying everything I can think of. Please put me out of my misery! I am sure the answer is simple, I am sure it is right there in front of me, but after all these tries, I guess I have become blinded, and this area is not my specialty at all.
Thank you SO MUCH in advance to anyone who tries to help me!
Server config (e.g. httpd.conf)
It is also important to find and fix the reason you get a 400-Bad Request for SSL requests, rather than serving your home page with a 200-OK response; You will indeed get many bad requests -- most of them malicious, and it's important to handle them appropriately. IOW, do not leave 'kludgey work-arounds' on your server if your search ranking matters to you -- Go by the book (the HTTP protocol specifications, in this case) with regard to server response codes and error handling.
---
One question not answered above, and not obvious from the small code snippets you posted:
Do you have *any* working rewriterules in this httpd.conf file?
If not, then there's a good possibility that you have not enabled mod_rewrite processing and/or enabled the rewriting engine within mod_rewrite. To enable mod_rewrite processing, either the FollowSymLinks or the SymLinksIfOwnerMatch Option must be set in the server context in question.
In addition, "RewriteEngine on" must be specified prior to the mod_rewrite code you want to process. Generally, this is only done once at the top of the mod_rewrite code, but you do have the option to enable and disable sections of mod_rewrite code by using "RewriteEngine on" and "RewriteEngine off" ahead of them.
For best results, I suggest that you observe the *exact* syntax specified in the mod_rewrite documentation, and *do not* feel free to play fast and loose with spacing or capitalization, etc.
---
In short, it may be necessary for you to include some 'set up' directives ahead of your code, in order to enable it.
Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{SERVER_PORT} =443
RewriteRule ^ http://www.example.com/ [R=301,L]
Because (for example) that rule will redirect requests for [example.com...] and other non-HTML resource requests to your non-SSL home page, you may instead wish to make the URL-path tested by the RewriteRule more specific and/or you may wish to redirect each SSL page to its non-SSL counterpart instead of redirecting all to your home page -- for example, changing the rule to:
RewriteCond %{SERVER_PORT} =443
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]
Jim
1. when you say "server must be restared" after changing httpd.conf, I assume you mean simply that I must invoke a service httpd restart, and not that the whole server must be shut down and started again...
2. I did try to find the bad link, by going through the logs, but I cannot find it. Will it be in the ssl or nonssl logs? I am still a bit confused - is this a http request, or a https request? Which log would it be in? Perhaps I don't know how to search - I searched for "400" in the logs, fot the error message 400, but couldn't find anything except references to robots.txt, which didn't make sense... Or does it? If not, how else would I find the cause of the 400 error?
3. Yes, in my httpd.conf, in "<VirtualHost *:80>", I do have "RewriteEngine On", and other rewriterules do work. And yes, Options -Indexes FollowSymLinks is on.
4. I do need to preserve access to https for certain secue sectioins of my site. So, my desire is to just rewrite the example.come:443 into example.com - I don't want to stp any of my acceptable https requests. So I don't think I can do this: RewriteRule ^ http://www.example.com/ [R=301,L]
5. Every change I make to httpd.conf in this vein, I test and then revert if the desired fix didn't take hold. So, I haven't gotten to the point where I got a fix, but perhaps in so doing screwed up my site everywhere ele. As it stands, my httpd.conf is still in in its original; working form. I am concerned about making a "deadly" mistake as you outlined, and know I have a lot of testing to do once I fix the 443 redirect to make sure everything else is untouched.
6. I have over the years done a lot of redirects, and have read the mod_rewrite doc... I am not a pro at all, but I have been successful in gettign these to work in he past. It is just in this case, I have some oddball situation I don't quite understand with the :443 present in the URL. My httpd.conf file is full of rewrites to deal with different robots and bad urls that have been indexed in the past. I simply never saw a :443 before and everythign that has worked for me in the past with different situations for rewritting URLs won't work for this one.
I'm going to give this a shot now, and then report back:
RewriteCond %{SERVER_PORT} =443
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]
Thank you so much for your help, Jim!
C
Changes nothing - http://www.example.com:443/ still goes to my home page, http://www.example.com, but the URL remains the same: http://www.example.com:443/
To donfirm, after I made the change, I invoked a service httpd restart, I did not restart the whole server.
+++++++++++++++
In my httpd.conf, I have this:
<VirtualHost *:80>
ServerAdmin webmaster@example.com
ServerName www.example.com
DocumentRoot /home/websites/example
ErrorLog logs/examplenossl-error
CustomLog logs/examplenossl-access common
ServerAlias example.com www.example.com
ScriptAlias /cgi-bin/ "/home/cgi-bin/example/"
RewriteEngine On
# no useful info really RewriteLog logs/rewrite_log
# no useful info really RewriteLogLevel 1
<Directory "/home/websites/example">
AllowOverride All
Options -Indexes FollowSymLinks
order allow,deny
Allow from all
Options +ExecCGI
</Directory>
Redirect permanent /index.htm http://www.example.com/cgi-bin/example
Redirect permanent /index.html http://www.example.com/cgi-bin/example
RewriteCond %{SERVER_PORT} =443
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]
C
RewriteCond %{HTTP_HOST} :443$
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]
Better yet, you should canonicalize *all* variant requests, using something like:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{SERVER_PORT}>s ^(443>(s)¦[0-9]+>s)$
RewriteRule ^/(.*)$ http%2://www.example.com/$1 [R=301,L]
eXamPle.Com.:8080
Important: Replace the broken pipe "¦" character with a solid pipe character before use; Posting on this forum modifies the pipe characters.
Jim
++++++++++++
RewriteCond %{HTTP_HOST} :443$
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]
-> doesn't work! The ":443" is still appended to the URL!?
+++++++++++++++++++++
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{SERVER_PORT}>s ^(443>(s)¦[0-9]+>s)$
RewriteRule ^/(.*)$ http%2://www.example.com/$1 [R=301,L]
This is clearly the change I want, but yet, when I add it to httpd.conf, restart httpd, no affect. The :443 is still in the URL.
++++++++++++++++
I am missing something... Does it have anything to do with the fact that this is a 400 error already previously changed in the httpd.conf with this?:
ErrorDocument 400 //cgi-bin/example
(which effectively goes to my home page...)
It feels like the :443 URL just isn't reaching my rewrites...!?
If it's not that, then please show us the request line from your raw server access log, and any other logged info you may have.
A service restart on Apache should be entirely sufficient as a 'restart'.
[added]
Your ErrorDocument directive should read:
ErrorDocument 400 [b]/cg[/b]i-bin/example Jim
Checking my logs now...
Thank you for being there for me, Jim! It would be great to resolve this before the day is through - it has been a major bee in my bonnet for days. And yes, my ranking is affected too, so I have the weight of that hanging over me as well... Not great.
OK, after extensive testing, this is what I get from my logs...
ssl_access.log (my IP replaced with 9s)
999.999.999.999 - - [29/Aug/2009:20:24:23 -0400] "GET /" 400 67281
ssl_request.log
[29/Aug/2009:20:24:23 -0400] 999.999.999.999 - - "GET /" 67281
ssl_error
<no entry>
Does this shine any light on my situation?
Does this man I need to be focussing on the ssl.conf file instead of httpd.conf?
RewriteCond %{HTTP_HOST} :443$
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]
+++++++++++++++++++++
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteCond %{SERVER_PORT}>s ^(443>(s)¦[0-9]+>s)$
RewriteRule ^/(.*)$ http%2://www.example.com/$1 [R=301,L]
I continue to bash my head against the wall here, I am no further along. I have been testing and experimenting for hours now... I've made zero progress. <grin> I am in the same place I was when I started. ; )
C
The only other thing I can think of is that perhaps the rules *are* being applied, but some other agent in the server (e.g. a script) is invoking a subsequent external redirect and tacking the port number back onto the hostname, causing it to re-appear in your browser's address bar. This could be detected by examining the browser/server transactions using the "Live HTTP Headers" add-on for Firefox/Mozilla browsers.
BTW, Control-F5 is NOT always the same thing as flushing the cache, as it forces a reload, but will in some cases accept a 403-Not Modified server response and use the browser cache anyway. That's why I use the phrase "completely flush (delete)" fairly consistently in this forum. For long/intensive testing sessions, it's often handy to disable the cache by setting its size or its 'storage time' to zero, depending on the browser's available settings. Of course, if you don't remember to re-enable it, you may be miserable when you return to regular Web-surfing, as it will likely slow down your browsing noticeably.
Jim
OK, I am going to get Live HTTP Headers set up for my Firefox.
I will also set the cache to 0 for now. Better I make sure I am really testing my fixes. ; ) I'll deal with slow browsing later...
Meanwhile, I will also carefully try to retry the code snippets 1 at a time, in both httpd.conf and ssl.conf.
Nothing in the log jumped out at you?
Either a lack of the initial redirect (from the mod_rewrite code) or the presence of a second redirect (adding the 443 back on) is an error. I'm being general here because I'm not completely clear on where this initial 443 is appearing in the first place, as evidenced by the shift in focus from the SERVER_PORT to the HTTP_HOST in the discussion above.
My statement above was meant to say that there's no harm in putting copies of both rules into both files all at the same time, and testing that way.
Note that if the mod_rewrite code is placed within a <Directory> container and that container defines the directory-path as ending with a slash (which it should), then the leading slash should be removed from the RewriteRule patterns, otherwise the pattern will never match. This is also true when the code is used in a .htaccess file.
Jim
I get very little! It looks like it doesn't hit the redirects at all...
http://www.example.com:443/
GET / HTTP/1.1
Host: www.example.com:443
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 65535
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
HTTP/1.x 200 OK
http://www.example.com/
GET / HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 65535
Connection: keep-alive
HTTP/1.x 301 Moved Permanently
Date: Sun, 30 Aug 2009 02:39:11 GMT
Server: Apache/2.0.52 (CentOS)
Location: http://www.example.com/cgi-bin/example
Content-Length: 338
Connection: close
Content-Type: text/html; charset=iso-8859-1
RewriteCond %{SERVER_PORT} =443
RewriteRule ^/(.*)$ http://www.example.com/$1 [R=301,L]
...I get the following:
301 Moved Permanently
The document has moved here.
Apache/2.0.52 (CentOS) Server at www.example.com Port 443
----------------
And 'here' is an hyperlink, which when clicked does take me to the right page with the 443 removed.
It is evident from the "Host: www.example.com:443" line in the Live Headers log snippet you posted in post #3980719 above that the specific 'error' is that ":443" is being appended to the Host header, and so the rule you should be using (in ssl.conf) is this one
RewriteCond %{HTTP_HOST} :443$
RewriteRule ^/(.*)$ htt[b]ps:[/b]//www.example.com/$1 [R=301,L]
Some other notes:
Be sure that this rule is the first -- or among the first of the RewriteRules in that file.
Be aware that configuration directives are not necessarily executed in the order that they appear in the config files; Rather, each Apache module in turn scans the config files, and handles the directives that it understands. Therefore, your directives are executed first by module execution order, and then by their order of appearance in your config files. What this means is that if you have other directives that modify the default URL-to-filename translation 'mapping,' then they may be invoked first. Examples of other modules which can affect this are mod_alias, mod_proxy, mod_dir, mod_negotiation, mod_speling, and several of the core directives such as DirectoryIndex.
So, if you have other directives that will affect requests for "/.*" URL-paths, these may be executing before your rewriterule ever has a chance to execute, and making it appear that the rule "doesn't work."
The typical approach to preventing these execution-order problems is to disable any modules your site does not require, and to use mod_rewrite for all rewriting and redirection if you use it for any rewrites or redirects -- i.e. don't mix the modules used to redwrite/redirect requests.
And an "Oh, by the way, but this is important" I'd like to add is that you likely *should not* be redirecting requests to cgi-bin at all. Rather, you should be rewriting these requests. A redirect will 'expose' the /cgi-bin filepath as a URL, and make your site much more vulnerable to malicious attention. Also, there is no reason whatsoever that your site's underlying technology should affect what the user (and search engines) see in their 'address bar' -- adding /cgi-bin to the URL simply clutters up the address bar and makes your search results ugly, hard to read, hard to remember, and hard to type. If all requests need to go to /cgi-bin, then just use an internal rewrite to point them there. (As this is a secondary issue, let's hold off further discussion of this point until the primary problem is solved.)
Jim