Jan 22 2018

False Alarm

On January 13 a state-wide alarm was sent out in Hawaii warning of an incoming missile. “BALLISTIC MISSILE THREAT INBOUND TO HAWAII. SEEK IMMEDIATE SHELTER. THIS IS NOT A DRILL,” the emergency alert read. For the next 38 minutes the citizens of Hawaii had the reasonable belief that they were about to die, especially given the recent political face off with North Korea over their nuclear missiles.

However, within minutes the Governor and the Hawaiian government knew that this was a false alarm, resulting from a technician hitting the wrong button. So, there are two massive failures here – sending out the alarm in the first place, and taking 38 minutes to officially send out the correction. (They did tweet that it was a false alarm, but the retraction was not generally known and it wasn’t certain that it was official.)

Then the media was shown a screen capture of the menu options the operator would have seen (image above). However, this is not an actual screen capture of the actual menu, but something similar to what the operator would have seen. So some headlines read that the screen capture was fake, but is does represent what the operator would have seen. You are probably now as confused as that operator.

The Hawaiian state government did review the situation and “fixed” it by adding a second operator who has to confirm any real missile warnings. They also added a menu option to issue a false alarm message – because there previously wasn’t one, and that is why it took 38 minutes to issue the false alarm.

While these steps are good, they are a long way away from addressing the real problem. What I think this represents is the general reality that our technological civilization is more complex than we can optimally manage. We simply don’t have the culture of competency necessary to keep things like this from happening.

It seems to me that in general people have a child-like assumption of competency for any large institution. This assumption is reinforced by dramatic movies showing a level of competency and control that is really a fantasy. Institutions are just made of people, who are incredibly flawed and limited.

Atule Gawande wrote about this in The Checklist Manifesto. Essentially we have historically dealt with complexity through training, creating professionals with incredible knowledge and technical skill able to handle complex technology or situations. This, of course, is often necessary. But, Gawande argues, we have passed the point where training alone can deal with the complexity we face, resulting in error.

Further, in many situations minor errors can have catastrophic results – crashing a plane, operating on the wrong body part, or sending out a false alarm. In such situations we need error minimization orders of magnitude beyond what a mere mortal can accomplish. So how do we do this?

Gawande recommends the humble checklist – a list of procedures to be carried out in order, and checked off to insure that nothing is missed. The checklist works, and is increasingly used in medicine and elsewhere – but perhaps not everywhere it should.

Error minimization is also accomplished through redundancy, which I have heard referred to as the Swiss cheese model. Every person is like a slice of Swiss cheese, with holes. If you line up several slices, however, any one hole is less likely to go all the way through. In other words, it is less likely that two people will make the same uncommon mistake at the same time. The probabilities of failure multiply, reducing the error rate by orders of magnitude.

What the missile false alarm event shows is that there are other layers to error minimization, and increasingly that means having a computer user interface that is designed to be user friendly, intuitive, and to minimize errors.

I think anyone looking at that screen capture will have sympathy for the poor operator. Sure, he screwed up. But in a way he was set up for failure by the system. The system had holes, and he was just the unlucky chump to fall through one of them in a very public way.

Anyone who works for a large institution will be familiar with such situations. Your work place is likely not uniquely incompetent, but is probably fairly typical of every workplace out there, which can be a scary thought. Errors like hitting the wrong choice on that menu must happen all the time. This error, however, happened to be affect an entire state in a dramatic way. The error itself was not dramatic, only the result.

Creating an intuitive and optimized user interface is an art form and a science unto itself, and it is amazing at this point in computer technology that it is so often neglected.

I use the Epic electronic medical system, which is used by many large medical institutions. It is huge in the industry – and the user interface is terrible. The arrangement of information and options on the screen is not intuitive or optimized, there is often too much information that is unnecessary, and processes are unnecessarily complex.

Further, user warnings are terrible. You get warnings when they are unnecessary, and you don’t get them when they can be helpful. So you get warning fatigue, and tend to just click through warnings that are unhelpful, making it more likely to miss an actually useful warning.

So I totally sympathize with this poor operator who faced that terrible menu that was not organized in any way, did not have unambiguously descriptive labels, and did not include a real warning to let them know unequivocally what they were about to do. Apparently they just got an “are your sure” button, like we get many times a day just for doing common computer operations- completely worthless.

It is obvious that the software being used had no serious user interface design, no failure analysis, risk assessment, or error mitigation. If reports in the wake of this incident are accurate, this is typical of government applications, which seems plausible.

And all of this is just one tiny slice of our complex technological civilization. The entire field of risk assessment needs to be taken more seriously, and should be ubiquitous in any large organization.

And please, software developers – hire dedicated user interface experts. Your regular software engineers cannot do this. Really. The difference is between an amateur B-movie and a professional film made by industry experts. There is a lot of knowledge, skill, and nuance that goes into communicating with people, whether it is through film or a user interface.

Any software that is used to prescribe medication, or send a missile launch warning, should be tweaked as hell, and frequently reviewed and updated, with feedback from the actual people using the software. You also can’t fix a bad user interface through training alone.

The missile episode is also an example of how the stakes are getting higher. Our systems are not only complex they are increasingly interdependent, and running critical aspects of our lives.

In short we need to have thoughtful systems in place that allow flawed humans to operate with minimal error, and those systems need systems to make sure they are optimized. This should be standard procedure, taken for granted as part of any process. That is the culture we need.

By coincidence I am in Hawaii this week on vacation. I will be seeing the Pearl Harbor memorial later today – that is another case history in the consequences of tiny errors in a system set up to fail.

17 responses so far

17 thoughts on “False Alarm”

  1. SleepyJean says:

    The Pearl Harbor memorial commemorates our veterans and the location which changed our history. But as far as errors go, there was a veteran who saw and reported the Japanese first war plane. When he contacted his peers he was told to disregard it and not be alarmed by his discovery.

  2. hotinhawaii says:

    It wasn’t just one choice on a pull-down menu that initiated the false missile alert. The software company gave a demonstration of the series of errors that had to be committed and the screens that had to be ignored before this warning went out. The governor and the head of Hawaii Emergency Management would like this to be the message so as to remove any blame for incompetence headed their way. Here is a PowerPoint they released of all the steps that must be completed to issue an alert. Note especially the simulated smartphone screen which shows exactly how the warning will appear to the public before it is actually sent. https://cdn.vox-cdn.com/uploads/chorus_asset/file/10058151/alertsense.0.pdf

  3. bend says:

    The Pearl Harbor memorial is open in the midst of the shutdown?

  4. varcher says:

    I was at a presentation for that very type of system on the friday, just before the incident. The vendor basically sells a toolbox of processes and notifiers; you (the customer) then design the process by which each notification is sent (and to whom, and how). It can be as simple as “hit the button, here it goes” or requires 3 different people to sign off on the alarm. And yes, the mockup didn’t strike me as weird… because I saw one exactly like that (the vendor described it as “a step up from having 8 URL links as icons on a corner of your desktop).

    To be fair, there’s a complex dance in the case of a missile alert (or a fire alarm, or other similar things). You can’t have a process that’s too complex, because if it takes 20mn to sign off all the alarms, you’re sending it to a population that’s already seen the bombs go off. That particularly goes if you need multiple validations, and most of the senior staff is off on holiday, preferably in a scenic corner with no cellphone coverage (plus, of course, said senior staff will have no way of knowing if it is a true or false alarm, so they’re going to validate it anyway… the cellphone equivalent of clicking yes reflexively).

    Note: the reason it took 38mn to get a false alarm message out is simple: those messages are pre-recorded and defined, so it takes time to get someone with admin privileges for the application, have him set a new text message, then launch the “alarm” process on that one.

  5. Parks are generally open, but without normal services. Some areas are closed.

  6. carbonUnit says:

    Aside from the issues with the alarm system itself, I hope they do a through review of how the public reacted to the alarm. What worked? What didn’t? They’ve “paid” for a dry-run, they’d better learn from it.

  7. aran says:

    The undertone of this article hits too close to home for me. There is a saying in manufacturing that “100% inspection is only 80% effective” – as long as the decision is left to a human a mistake will inevitably be made. Error proofing, or idiot proofing – as the less PC minded would say, can be extremely difficult and expensive to implement – not only does it take incredible ingenuity and know how, it introduces a whole new system (with it’s own set of problems and constraints) that has to be managed. In fact, it isn’t uncommon to find all the error proofing “bells and whistles” to be more sophisticated (and expensive) than the functional part of the machinery. Much more effort is spent “reducing things down” to elemental tasks than spent training someone to be more proficient as a whole – this IS the standard procedure.. and a necessity. To be competitive companies have no choice but to remove as much human interaction and decision making as possible. There is something to be said when the folks actually making the product are often the lowest paid or educated. What do we have because of it? Jobs with tasks that are so fragmented you’d have no idea what it is you’re making. Is this how a human being should spend half of their waking hours? Repeating the same set of simplistic motions without even a clue (or care) as to what they are doing? We cannot continue dumbing things down, but rather, focus on making people smarter and more aware.

  8. slogra says:

    I can’t speak for every government agency, but as a low-level employee at the local level of a city government, I know that we purchase our hardware and software from the lowest bidder. On the one hand, this hypothetically reduces government spending (or at least creates the illusion of it). On the other hand, these lowest-bid systems are so flawed that unknown amounts of time and other resources are spent dealing with them.

    Everything from our plastic file dividers (that break easily under the mild strain of being partially filled up) to an outdated computer system (our jail computers are literally a DOS program from the ’80s) is bought at auction.

    To echo what you said Steve, quite frankly, I’m not surprised this false alarm has happened at all. I’m surprised it doesn’t happen more often.

  9. hardnose says:

    “Creating an intuitive and optimized user interface is an art form and a science unto itself, and it is amazing at this point in computer technology that it is so often neglected.”

    That is so true. It is obvious that most organizations do NOT hire cognitive scientists to design their computer interfaces. More likely, that job is left to software developers who know nothing about interface design. They are not artists and they are not scientists.

  10. BillyJoe7 says:

    So just today I paid my rates on line. The account sent out from the local council said to type [domain name]/payments. I did that, and instead of bringing me directly to the where I can pay my rates, it bought me to the home page! There was no “payments” link to click on, so I had to go to the search box and type in “payments”. This bought me to a list of nearly everything you could possibly think of that you might need to make a payment for…except your rates!. So I went back to the search box and typed in “rate payment”. This bought me to a page that tells you everything about rates and, finally, at the bottom of the page, where I nearly missed it, was a link to the page where I could actually make my payment.

  11. David Twitch says:

    “It wasn’t just one choice on a pull-down menu that initiated the false missile alert. The software company gave a demonstration of the series of errors that had to be committed and the screens that had to be ignored before this warning went out…”
    https://cdn.vox-cdn.com/uploads/chorus_asset/file/10058151/alertsense.0.pdf

    It’s a little hard to reconcile those screenshots with the others that have been shown, but regardless, even the screenshots in the document above are horrendous. For one thing, it’s inexcusable that test messages and real messages would even be mixed together on the same page of the interface.

  12. Charon says:

    “Every person is like a slice of Swiss cheese, with holes. If you line up several slices, however, any one hole is less likely to go all the way through.”

    Ah, but if these slices are were adjacent in the original block of cheese, it’s far more likely the hole will go all the way through, since the slices will be highly correlated.

    This is what I point out to people who say things like “the similarity of near-death experiences proves the existence of heaven”. They forget people often make the same kind of errors, because we’re similar wetware, and have correlated issues…

  13. sarah_theviper says:

    Epic sucks at even cleaning level. The version we use likes to send us from one room on one end of the unit down to one way at the other. The thing is a lot of the time it is sending us to rooms we can’t even get into because a doctor or nurse is in there. A lot of time is wasted running back and forth.

  14. sarah_theviper says:

    A couple years ago I met this dude, who said his job was monitoring the chemicals going into the water of the city where he lived. He said it sounded tough, but really it was him sitting in front of a computer, and he really only had an hour of work. The rest of his shift he spent studying for his engineering degree and calling girls.

  15. pmacdaddy says:

    Hi,

    I am a retired software developer of >30 years.

    As usual Steve you hit the nail on the head, you need skilled professionals to develop intuitive user interfaces where every possible combination of user error has been thought and if possible prevented.

    This also means destructive testing where you try and break systems, thats where you will find the holes.

    One thing you could do with your medical systems, is try and get the email address of the programmers and offer to goto them (or they come to you) and point out the obvious flaws of their system.

    Too often there are too many layers between and end user like yourself and the actual developers. In my experience a 30 minute meeting with the actual users sitting beside them and watching what they do was invaluable in creating software that actually address what the users were doing.

    Hope this helps … good luck

    Yours Paul

  16. Newcoaster says:

    It sounds like there were system failures to go around, not the least of which was the Governor of Hawaii couldn’t remember his Twitter password to send out a tweet. (I also don’t remember my twitter password….but it’s not an issue since I don’t actually use Twitter.)

    I can related to Steve’s problems with his medical EMR (he’s complained about it in the past) The one I use isn’t bad. It was better when it was a small startup, and they responded to user issues almost right away. Since it got bought out by a large telecommunications company, it has become more complicated, less user friendly, and one can rarely get an IT person to help with a glitch in a timely manner.

    The Rx writing program is pretty good, but, it does have alerts and warnings about drug incompatabilities, and the user has to manually overide the system if you want to Rx it anyway. However, it doesn’t say WHAT the incompatability is, which means I still have to look it up through other means or call the pharmacist.

  17. trumpproctor says:

    Just on a side note, I’ve seen plenty of people on twitter and in the media skewer Trump over not saying anything about this, like assuring the people that it was a false alarm, that the government will look into what went wrong, etc.

    While there’s a million things to legitimately skewer Trump over, this isn’t one of them. He didn’t design the flawed system that allowed this to happen, he didn’t purchase it, and by the time he was made aware of it, it was already being handled by local authorities. I think it was not an issue for Trump to even address.

    I know many here (myself included) are validly critical of Trump, but one thing to be careful of is to lay criticism where it belongs and not where it doesn’t. Otherwise you can get to a point like Fox Propaganda viewers where after 8 years of watching Obama hate porn, they’ll skewer him over putting grey poupon on a hamburger or wearing a tan suit, and even worse many believe the most ridiculous anti-Obama/Dem/left wing wack nut conspiracies without a hint of skepticism, simply because it fits the agenda they’ve been spoon feed on a daily basis.

    Trump has already shown a thousand reason why he’s not fit to even sit on a local school board, but his lack of response on this isn’t one of them.

Leave a Reply