|blocked regular IE6 by mistake|
We all know those short user agents that make strange (invalid) requests. And we maintain the "no" list that includes them.
Recently I added this one:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
based on the (strange) 404s created on multiple sites.
Earlier today I made some layout changes to one of my sites and remembered to check it in IE6.
I run Windows 7 with IE8 (use Firefox regularly) and I use Windows Virtual PC for IE7 and IE6.
Aaarghhh... 500 on IE6. "You imbecile" I thought...
Are there any other similar (short) UAs that are just OK? I still block other short ones that are similar to this.
I would always expect regular browser to come with more than that.
According to this thread [webmasterworld.com]:
|About 4-5 'real' users a week seem to be still enjoying a plain vanilla, never updated version of IE6 on PC's that do not have .NET extensions installed |
I know someone who still has an old computer with IE6 with which browsing the web can be a chore. IMO, users with the original IE6 probably have low expectations and resigned to the fact that some websites just won't work for them.
So I block that user agent and I sleep fine at night :)
If you got a 500-Server Error response as opposed to a 403-Forbidden response, then you've got a major problem with your "blocking" code...
Be *very* sure that your code includes provisions to allow your custom 403 error page and robots.txt file to be fetched absolutely unconditionally.
If you use mod_access to deny IP addresses, etc. you can do this by using mod_setenvif to set a server variable when either of these files is requested, use the "Order Deny,Allow" setting, and then use "Allow from env=varname" to test that variable and override the "Deny froms" to allow access if it is set. Example:
SetEnvIf Request_URI "^(robots\.txt|custom_403_page\.html)$" allowall
Deny from 188.8.131.52
Deny from 184.108.40.206
Allow from env=allowall
If you're using mod_rewrite for user-agent denials etc. then you can add a rule at the top of your mod_rewrite code that terminates mod_rewrite execution if either of these files is requested. Something like
RewriteRule ^(robots\.txt|custom_403_page\.html)$ - [L]
If you use both mod_access and mod_rewrite, then both exclusion methods will be needed.
Otherwise, you set yourself up for two "self-inflicted denial-of-service attacks" -- the first caused by an "infinite" loop of 403-Forbidden responses leading to a 500-Server Error, and the second caused by some (dumb or malicious) robots interpreting a 403 on robots.txt as carte-blanche to spider the whole site, with the result that a lot of 403-Forbidden responses will be served unnecessarily to robots-txt compliant robots (and may then trigger the aforementioned 403 loop as well).
In both cases, I describe the basic provisions needed in well-structured and well-coded config and/or .htaccess files. If your code is complex or not well-ordered, then more work may be involved.
Thanks very much.
I apologize for making you do all the writing - although it does not hurt to hear from Jim - and bookmark it.
I tested several sites on different servers (hosting providers), and on first two I got this (which caused me to react with Aaaa... 500):
You don't have permission to access / on this server.
Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.
On some other servers it states nicely 403 on the top.
Obviously, the code worked fine for 403, but then failed on custom 403.
This is what I have in my .htaccess:
ErrorDocument 404 /error-404.php
ErrorDocument 403 /error-403.php
And the code in 403 PHP file is this:
header('Status: 403 Forbidden', null, 403);
$from_header = "From: email@example.com\r\n";
$to = "firstname.lastname@example.org";
$subject = "403 Error";
$today = date("D M j Y g:i:s a T");
$ip = getenv ("REMOTE_ADDR");
$requri = getenv ("REQUEST_URI");
$servname = getenv ("SERVER_NAME");
$pageload = $ip . " tried to load ["...] . $servname . $requri ;
$httpagent = getenv ("HTTP_USER_AGENT");
$httpref = getenv ("HTTP_REFERER");
$message = "$today \n\n$pageload \n\nUser Agent = $httpagent \n\n$httpref ";
mail($to, $subject, $message, $from_header);
The reality is that I do get emails caused by 403. Do I get all, I don't know at this moment, I have to pull the logs and count errors and compare to emails I got.
Anyway, when it says 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request - does this mean it failed at reading the line in .htaccess or when it attempted to execute PHP code?
Can't help with the last part but my own MSIE6 (Windows 2000) is still plain vanilla. Since I never use it online it's not a problem but because of this I only block that UA on certain combinations of other header fields. Monitor all headers for each one cnd see what patterns emerge. I sometimes get apparent baddies for "acceptable" header combinations but they may be due to some accelerator, perhaps.
|Anyway, when it says 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request - does this mean it failed at reading the line in .htaccess or when it attempted to execute PHP code? |
Yes, and likely because you have made no provision to exempt all fetches of /error-403.php from access-control restrictions.
Example: A bad-bot tries to fetch your home page, and because you have blocked it, it gets internally redirected to /error-403.php. But /error-403.php is also forbidden, so another 403 is generated because of that attempt. As a result of this second 403, the server attempts to serve /error-403.php. But it's still forbidden, so another 403 error is generated, which also fails, creates another, and another, and another... Finally the server detects this "looping" error, and generates a 500-Server error. But /error-500.php is also Forbidden...
This is a major problem, as I stated. It is now trivially-easy for a bad guy to put your server into a loop and tie it up with self-generated errors...
Back to your first post then. Thanks.