homepage Welcome to WebmasterWorld Guest from 54.226.173.169
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
Tracking of 400 and 500 Errors Part II
Are you being Proactive or Reactive
pageoneresults




msg:3659306
 6:28 pm on May 26, 2008 (gmt 0)

Tracking of 400 and 500 Errors Part I
Are you being Proactive or Reactive
[webmasterworld.com...]

After working in various programming enviroments for quite a few years I am convinced that there is no such thing as an "error free" application, that's a pipe dream (for many).

Our 400/500 error reporting routines are assisting us on a regular basis in keeping things in check. Yes, I know, the errors should not be there to begin with but hey, if you can provide me with an error free application, I'm game!

I can see many ecommerce related sites losing big money by not tracking their 400/500 Errors. All those 500 errors that occur during the checkout phase that you were not able to replicate during development. What happens to that sale? :(

Anyone care to share what you're doing in this area? Do you micro-manage 400s and 500s on a 24/7/365 or do you have the perfect application that doesn't produce errors?

 

rocknbil




msg:3659922
 3:08 pm on May 27, 2008 (gmt 0)

do you have the perfect application that doesn't produce errors?

Saw this yesterday, thought for sure you'd have responses by now! :-)

I was a bit surprised at the question - why on Earth would you allow an application to continue that produces 500's? Or am I missing something?

The only 500's you should receive is if a dependent service fails, for example, attempting to connect to a database and the mysql service is down. That's really a bad example, because you should even have error handling on the connection itself.

Another example: You attempt to send mail and the mail program itself prints out to STDOUT before a content-type header is printed. Even in this case, you should pipe output from the mail program into a variable, or print the header in advance so it doesn't 500.

All those 500 errors that occur during the checkout phase...

Especially, particularly, and above and beyond all, this is where comprehensive error handling should occur. Can't connect to the payment processor? Error handler! Payment processor returning nothing, or an unexpected response? Error handler! Malformed input? Error handler! Additionally, at this phase, I not only want to log it, I want to know immediately - a small administrative email routine at any phase of connection to the payment gateway should alert you.

I seriously doubt my programs are "perfect," but 500 is a personal bane of mine (especially since "internal server error" isn't the least bit helpful.) It shouldn't matter what condition or input you throw at a program, it should never 500. Error handling is a prerequisite to programming, your program should capture every condition that could cause a 500.

What an I missing?

my 404's are handled by mod_rewrite which passes to a script that logs every time it's called.

pageoneresults




msg:3661183
 6:54 pm on May 28, 2008 (gmt 0)

I was a bit surprised at the question - why on Earth would you allow an application to continue that produces 500's? Or am I missing something?

Huh? Am I missing something is the question. Maybe I've been misinformed but I'm told to expect a certain number of 500s, they are a given and need to be addressed somehow. Since I'm not the brains behind this 400/500 tracking routine, I cannot come to my defense properly. But, I did get some input from my lead programmer about why we do it. ;)

well, it's good that they go the extent of covering everything. that usually works with a company with enough money to hire qa and good developers who make sure everything should be working properly.

but that just makes the 500 error into their custom error tracking. that'll reduce the number of 500 errors for him, but that does not mean there would never be 500 error. it's like "i'm buying every possible insurance policy i can think of." but when things happend...i t's the one that you forgot to buy. and the 500 is there like a catch-all to help you to see what you have missed and fix it before the problem gets bigger.

oh, even with all the exceptions added, there could always be other factors that cause the app to fail. for example, someone migrated app from dev to production and forgot to load one table data, or hdd corruption and messed up one file. even if you have multiple servers running, some users will see the error on that computer with the corrput file.

basically there are tons of things that could go wrong and are unexpected. so it's always a good policy to add in all the exception checks so the user won't see the 500. but it's also a good policy to always mointor the 500 for unexpected issues.

heck, even windows have the blue screen. microsoft will be like "our windows OS is perfect, it's those damn 3rd party apps that crashes it."

i guess it boils down to how much money and effort you want to put down to make sure everything is perfect (and nothing is perfect in the programming world)

but keep in mind a lot of the app are actually using the 500 error as their custom error handler also. they just made it custom. i've also seen some people display a "site is under maintenance" when the 500 error i found.

Your response concerned me so I shot my lead programmer the link to this topic. I want to know why we are doing it. After having access to this type of reporting for almost a year now, I'm convinced that it should be part of anyone's tracking activities.

You've made it sound like this is something that most of us need not be concerned with, yes?

rocknbil




msg:3661359
 10:04 pm on May 28, 2008 (gmt 0)

You've made it sound like this is something that most of us need not be concerned with, yes?

Well . . no . . . from the programming I have seen, it's a GREAT cause for concern. I just don't understand how any programmer can allow them to persist. My examples will apply to perl, my preference, but they apply to **any** programming language.

I see things like

&do_something or die("cannot do_something");

While there are methods of handling "die" (Carp::FatalsToBrowser, I think is the module) most scripts I've seen don't. They just let the script die, which will give you a 500: no content-type, no error capture, and worse of all for me, no indication of where to look..

Their argument is "well if everything is working, that should never happen." RIGHT. Until Mr. SQL-injector comes along and starts feeding your script something you don't expect. One of the hooks they use is to feed a script their garbage until they get a 500, which tells them it's not handling input data correctly. You'll find references to this approach somewhere in the SQL injection documents out there.

What I do is

do_something or &error("oops $!");

sub error {
$err = shift(@_);
print "content-type:text/html\n\n";
print "error: $err";
exit 0;
}

The system error is stored in $!. so what you get is

oops cannot open file /x/y/z: permission denied

Which tells me the file or directory has the wrong permissions. The exit command is important. Another mistake I see is the or die conditional is completely left out . . . .

&do_something;

the program is allowed to continue and the subsequent actions may "cover up" your error. This complicates your work even more, you want to "do_something or tell me why" at every action.

Yes, EVERY ACTION. This is what I call "good programming" sorry if that sounds arrogant, I don't mean it to be . . . just my "standard."

Of course, once debugged, you would never ever ever allow "file x/y/z" to display in a response. The specifics should all be removed at deployment.

This simple management alone will remove hundreds of 500's. Not only that, if you craft the error handler correctly, each one will tell you exactly where to look. Using your programmer's example,

someone migrated app from dev to production and forgot to load one table data,

$select = "select * from missing_table;";
$sth = $dbh->prepare("$select");
$rv = $sth->execute or &error("Cannot read missing_table: $select");

Poof. Another 500 bites it, and you can fix it in a matter of minutes. (Ditto on removing $select from the error routine when deployed.)

An equivalent to this,
hdd corruption and messed up one file.

is this.

The only 500's you should receive is if a dependent service fails, for example . . . .

On the topic of custom 500's,

a lot of the app are actually using the 500 error as their custom error handler also.

This should be a SEPARATE program and not integrated into an application for this very reason. Yes, I maintain your script should NEVER 500 - but if it does, you have this:

.htaccess
ErrorDocument 500 /cgi-bin/err-handler.cgi

err-handler.cgi would be a simple script that emails you immediately and presents a page of some use to the end user. It collects any environment variables, browser, user input, anything it can get it's hands on to clue you in on what went wrong. If you can't find it immediately, you look into the server logs for what was going on at the same moment.

And when you find it, you build a trap for it in your program. Say NO to 500's! :-)

I know this all falls down the moment perl stops executing. But you know . . . you do have to check in on your web sites once in a while. :-)

Aside:
heck, even windows have the blue screen.

Who ever said MS programming was clean or complete? :-)

pageoneresults




msg:3661372
 10:28 pm on May 28, 2008 (gmt 0)

Thanks Bill, this is definitely an interesting discussion. I've now brought in a second set of programming eyes to get their input on the whole 500 thing and here is what they had to say...

Very interesting response by him. It is impossible to run any application and not generate any errors. However, to have the proper error handling, etc it would have taken 2 to 3 times the budget to build in that type of error handling and control mechanisms. To properly do that, you would need to be able to have a proper design session, dev environment, Q/A unit testing, etc. That alone is what top world-class web sites use and cost hundeds of thousands of dollars and a team of testers unit testing and stress testing for hundreds of hours, etc. Also, all new development would need to be done in phases with a dev server and full on testing and acceptance.

Now, if I were to tell you that we operate in a "on demand" dev environment and we don't have all of the above luxuries, is the average programmer going to do what you specify above?

rocknbil




msg:3661382
 10:43 pm on May 28, 2008 (gmt 0)

I don't know what the definition of "average programmer" is. By the free scripts and second hand scripts I've seen the answer is almost always no. But it's not that hard. In fact, I was called in by a company to get a script working - we fixed it by dropping in the exact same error sub you see above, going through the script, and adding the "or" conditional above to every action. It was running in 15 minutes.

It is impossible to run any application and not generate any errors.

? :-0
I really don't know how to respond to that. How is it impossible? Doesn't anyone else do this? Doesn't anyone else test-bed before putting on a server, and recognize the importance of error handling every action? Maybe I'm a freak. :-)

However, to have the proper error handling, etc it would have taken 2 to 3 times the budget to build in that type of error handling and control mechanisms.

Well, I just don't agree. You can see how "difficult" it is. How difficult is this

&do_something;

over this?

&do_something or &error("oops $!");

You see the error sub, it's 6 lines. If the project is a $5 project, yeah, okay, it will cost you twice as much . . . .

Adding comprehensive error handling after the fact, maybe, that could be costly. I've done it myself to old programs that where sloppy, built out of inexperience.

But if it's added as it's built it's actually making my job easier. If I add that little bit - "or &error" - to every action, it saves me hundreds of hours on deployment and debugging.

To properly do that, you would need to be able to have a proper design session, dev environment, Q/A unit testing, etc.

I have none of these. I command-line test locally before upload to a test bed, then try hacking my own programs. I haven't succeeded yet, so far, neither has anyone else.

lorax




msg:3661490
 1:32 am on May 29, 2008 (gmt 0)

>> It is impossible to run any application and not generate any errors.

Wow... um.. I don't agree with this statement completely. The server might throw an error but it should always be trapped and handled with good programming practices. The challenge as a programmer is building a good set of programming tools to do this work.

pageoneresults




msg:3661711
 10:55 am on May 29, 2008 (gmt 0)

Hmmm, now I have two of you that I respect telling me something that I've not experienced yet!

Let me give you an example of one environment. Let's say you have 500 students that are active on a daily basis with online distance learning programs. They are viewing streams, taking quizzes, submitting essays, etc. There are upwards of 300-500 system messages being generated daily. All sorts of activity. And, all this takes place in a real time dev environment due to budget constraints. Oh, its a piece of work, believe me. But, this type of environment does have its challenges. You're cookin' along for a few days and then a few 500s come in. Multiple dev personnel are notified and they are addressed on demand, we know how critical it is to stay on top of these. This has been ongoing for quite some time. Right after the error reporting implementation, there was a month of chasing these obscure errors that just were not considered during the "gotta have it tomorrow" launch dates.

I think this all comes down to what the applications are doing. Sure, there are basic applications that are built and are most likely error free for the most part. I still don't believe "error free" is a common find amongst many applications, that's a pipe dream in my book. That's like 100% uptime.

What concerns me is that it is just the three of us discussing this. ;)

lorax




msg:3661827
 1:38 pm on May 29, 2008 (gmt 0)

Maybe I'm not understanding the situation.

When you say "system messages" what are you referring to? Network errors, application errors, or ?

You noted the users are working with apps that exist online. Are these servers within your control?

pageoneresults




msg:3661834
 1:44 pm on May 29, 2008 (gmt 0)

When you say "system messages" what are you referring to? Network errors, application errors, or ?

No, those are messages that are sent out depending on the student action. Notifications, promotions, verifications, etc.

You noted the users are working with apps that exist online. Are these servers within your control.

Yes. All of them.

rocknbil




msg:3662003
 4:10 pm on May 29, 2008 (gmt 0)

Well, I will revert to the original reason I replied: "what am I missing?" In terms of services on which a program relies, like it's own interpreter, these will cause your program to 500 when they fail. There may be things I'm unaware of that would create unavoidable 500's, but from the discussion so far, I don't see any.

Seriously, I would expect my clients would fire me if my response was "some of that should be expected." In my previous corporate environments, it was just not acceptable. I've seen this attitude many times in programming. "Why should I add an error trap for a condition that will NEVER happen?"

The answer is simple: because you don't know everything! Thinking you do is going to lead to a very embarrassing moment when the unexpected arises.

I'm sure it's common for a program to lack the appropriate traps to avoid 500's. That still doesn't make it right, or acceptable. A program should accept input and dazzle the user with it's magic, but part of that is making sure when things go wrong it manages that as well.

What concerns me is that it is just the three of us discussing this.

Agreed. In a way, it's like the SQL injection issues. No one worries about them until trouble arises, I guess.

IanTurner




msg:3662017
 4:33 pm on May 29, 2008 (gmt 0)

I'm following this discussion - pageoneresults is saying many of the things I would have said, but rocknbil has basically called us out for the cowboys we really are.

To answer the first question - we monitor 500 and 400 errors on a daily basis and if they show up in key areas of the aplication - such as shopping cart they are dealt with immediately (and in a lot of cases that is just putting error trapping in for situations that just shouldn't have occurred.)

coopster




msg:3662036
 4:53 pm on May 29, 2008 (gmt 0)

I think everybody here agrees on one thing for certain so far, error free to the user is one thing, error free applications are another. The programming discipline of proper error handling is often overlooked, brushed aside, planned for later or just plain laziness. Budget constraints could be piled into the "brushed aside" category, if the end user is given an option, but I actually plan that in the project estimate. Inadvertently, I guess, really. It's just been a part of my development process. I started out programming in scripting languages but once I moved to compiled code I quickly learned the value (and discipline) of proper error handling routines. I then applied the same to scripting languages. I think the end customer has a certain expectation in this regard as well. If you have to line item error-handling on an estimate you are going to get questions whereas if it is just another dynamic of the quality code you are delivering, they shouldn't have to see or know the difference. And neither will you, the practice becomes second nature.

As has been mentioned in Part I of this thread, proper error handling can greatly minimize impact to an application. Cause/recovery services are much more efficient/effective and urgent-care-required situations can often be handled before end users are impacted. Not to mention monitoring/capturing hack-attempts! Plan and recover from every possible error you can fathom until the cat has crossed the keyboard. Then you address that issue.

Create your error handlers with code re-use in mind and you don't have to reinvent the wheel each time. Invoke your error-handling modules using your programming language's error-handling utilities and you are good to go.

Fatal/critical errors are another issue altogether. We do a certain level of internal testing and then we request that the end customer also test a project before implementation. If you want, you can push the entire QA out to the end customer but if they find blatantly obvious errors it can be a reflection of the quality of work you provide. Best to practice otherwise.

BTW, I just grepped two years worth of the access logs and the processed access logs for fifty domains and returned ten 500 errors. All ten errors were from a single site on a single day, within fifteen minutes ... forgot to add a favicon. Hey, it happens :-)

lorax




msg:3662054
 5:06 pm on May 29, 2008 (gmt 0)

>> those are messages that are sent out depending on the student action

So these are messages generated by your system to the students.

The last question is are all of the applications built by you and your team or are some of them OTS?

pageoneresults




msg:3662093
 5:32 pm on May 29, 2008 (gmt 0)

But rocknbil has basically called us out for the cowboys we really are.

Hey wait, I'm just a Ranch Owner, you can't hold me responsible for the hired hands. Isn't that what some business owners might say? ;)

Yup, I got busted. But, I'll bet there are a whole bunch of others right now who have sent emails to their programming team looking into this. :)

It's just been a part of my development process. I started out programming in scripting languages but once I moved to compiled code I quickly learned the value (and discipline) of proper error handling routines. I then applied the same to scripting languages.

You see, I'm surrounded by perfectionists! Where were all the perfectionists back in 1995 when I started down this trek, huh?

BTW, I just grepped two years worth of the access logs and the processed access logs for fifty domains and returned ten 500 errors. All ten errors were from a single site on a single day, within fifteen minutes ... forgot to add a favicon. Hey, it happens :-)

Show Off!

The last question is are all of the applications built by you and your team or are some of them OTS?

All built by my team. And they are following this topic. I'll send them a link when each of you reply and ask them, "so, what say ye to that last reply, huh, huh, huh?" I'm really on this whole error handling routine. I want perfection. I want those 100% error-free applications that you guys are building. But remember, I have a very tight budget, can I still have all that and eat my cake too? ;)

lorax




msg:3662105
 5:42 pm on May 29, 2008 (gmt 0)

>> All built by my team

Well then. That settles it. They owe you a case of single malt.

Seriously though. I cannot see any good reason why there should be any unhandled exceptions or errors outside of production schedules & costs which do not allow for error checking and trapping. If this is the case, then your programmers should be raising the red flag and telling you so and letting you give them some direction. I'm not talking micromanagement of everything that should or shouldn't be included but rather an order of magnitude for costs and impact so you have a way to measure cost/benefits.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved