I was looking at the error report from one of the front-line users at my employer, and they had included a screenshot of the system. The system was helpfully telling them:
An error occurred: java.lang.NullPointerException
It’s incredibly frustrating for me to see something like that. I appreciate that the system is trying to be helpful, but it is never ever a good idea to show internal details like exception names to end users (i.e.: non-technical users).
Why Not? It’s Useful Information
The error details are certainly useful… to developers and administrators. I’m sure I’ve written software myself in my younger days that helpfully explains to users the technical details of the error, including the exception classname or internal error code.
It wasn’t until I started working in Operations and directly supporting end users that I realised the psychological effects that this kind of information can have on end users and the significant costs that result.
Essentially, many users see error messages with technical detail and interpret that as the system has failed in ways those techo guys didn’t expect and so this thing must be pretty crappy and unreliable. Confidence in the system is easy to destroy and hard to rebuild.
Is that important? For paying-customer software the answer is obviously yes – confidence in the system is directly proportional to revenue. What about for internal corporate software? After all, the users don’t have any choice but to use the application, so it doesn’t matter if they like it or not.
There is still a significant cost in corporate environments: the change management cost. When users have no confidence in the platform:
- Communications strategies for future changes become more costly and less effective because disillusioned users aren’t listening.
- Engagement for requirements and design for future versions becomes more difficult (“I don’t have time for this requirements meeting, I’ve got a job to do and the system is crap anyway.”)
- Supporting the software becomes more expensive, as users are quicker to jump to support, and less helpful when engaging the support staff.
All of these costs are hard to quantify, which means they don’t end up explicitly in project plans and support models. That’s why we end up with these kinds of errors being displayed in dialogs to users, and nobody really caring about it.
What should developers be doing instead? What if you can’t prevent it?
Firstly: developers should absolutely create systems that record as much detail as possible, and make that data easily accessible to support staff with the relevant customer/user details. Specifically:
- Log everything together in a central place, but make a special “support staff” view (accessible with elevated rights) where support staff can quickly query the error information from the live system. I’ve seen too many systems where the support staff need to raise a production service request on another support team to get an extract of the logs, which leads to user conversations like, “I’ve logged a support ticket for the error you’ve reported, I’ll get back to you in a couple of days.”
- Don’t expect the support staff to wade through a stack trace to find the source of the error – summarise your best guess of what have might have occurred (e.g.: error communicating with XYZ service”) so that the support staff can quickly triage the issue and start taking action. It’s often not possible at development time to know exactly what may have gone wrong when the code executed, but there is always some interpretation possible. Even a list of possible reasons for the error is helpful for support staff.
- Log all of the relevant details together so that support staff can query based on the information they know. The end user will report an error with something like a customer ID, case ID or transaction ID, so the support staff need to be able to query directly on that business-domain parameter. I’ve worked with systems where the support staff have to query on internal database IDs or date ranges and then hunt for their data across multiple database tables, and that obviously leads to poor customer experience.
So that leaves the question of what to put on screen, and that needs to be answered with, “what can I put on screen that maintains or builds user confidence?”
Technical information that is meaningless to the user does not build confidence. Confidence is built when the system appears to the user to have the issue under control, which means a business-language message about what occurred, and a clear statement about next actions – and often that is simply “try again”. It’s important not to leave any ambiguity in the users mind about whether their data is safe or corrupt, if the system will recover, and what their next action should be.
The message I saw at work was
An error occurred: java.lang.NullPointerException
A much better message would have been:
The Credit Decision service did not respond in time with a decision about the credit requirements for this customer.
Wait a moment and resubmit this application for credit decisioning, as the service may be experiencing high demand. If the problem persists contact the XYZ Support team and quote your application number.
The downside of providing better error messages is that they tend to be longer, and users are infamous for not reading anything on screen. This is exacerbated by a lack of confidence in the system – they’re even less likely to read longer dialogs when they have no confidence in the system’s robustness. Still, it’s better to have the information there than leave the users wondering if the system has eaten their data.
So what about when it’s not possible to prevent technical information from appearing in front of the users? Not possible implies something that can’t be predicted and from which the system can’t recover on the fly (if it could recover it could deliver a controlled error message). If the system is in such a catastrophic situation then it’s pretty reasonable that technical information might end up in front of users – hopefully these kinds of situations are pretty rare for your organisation, and if they’re not, it’s probably worth tackling that instability first.