Feb 18 2011

Dr. Watson

Recently an IBM computer program, named Watson, beat the pants off the two top performing Jeopardy champions Ken Jennings and Brad Rutter. This if being hailed as a demonstration of the superiority of computers over human intelligence, and also a breakthrough in intelligent systems (although no one is claiming any sort of artificial consciousness for Watson). The demonstration has also sparked speculation about how systems such as Watson can be applied in the future – with some speculation going too far, in my opinion.

First, let’s find out what Watson actually is. IBM describes Watson as a “system designed for answers” and to work with natural language. They chose Jeopardy (a trivia game show) as a model of this task. This is how they describe the hardware:

Operating on a single CPU, it could take Watson two hours to answer a single question. A typical Jeopardy! contestant can accomplish this feat in less than three seconds. For Watson to rival the speed of its human competitors in delivering a single, precise answer to a question requires custom algorithms, terabytes of storage and thousands of POWER7 computing cores working in a massively parallel system.

It is interesting to know that in order to create the winning performance in real time such a massive system was required – thousands of computing cores. Such a system won’t be sitting on the average desktop anytime soon. However, Moore’s law (assuming it continues to hold up for a while, which seems to be the consensus) predicts that within 20 years or so we will have today’s super computers on a desktop.

The software is perhaps more interesting. The algorithms developed needed to understand natural language, and also they needed to be good at playing Jeopardy. This means not just coming up with the most likely answer, but knowing when to ring in and offer a guess (in a game where a wrong answer costs money). So the algorithm also had to come up with an estimate of the level of confidence in the answer, and have some algorithm as to where to set the threshold for ringing in. And of course Watson had to have a database of information, about a wide range of topics.

In the end the performance was impressive.

In the second day of Jeopardy‘s three-day “Man vs. Machine” special, Watson wiped the floor with Ken Jennings (a 74-time champion) and Brad Rutter (a 20-time champion). Ken Jennings ended day two with $4,800 and Brad Rutter ended with $10,400, while Watson took home $35,734 in prize winnings. Out of 30 answers, Watson was first to buzz in on 25 of them, getting all but one of them right.

What does this performance really mean for the future of computers? That remains to be seen – but it seems to me that Watson is essentially an expert system, a system capable of providing specific information from a vast database. Perhaps the greatest innovation is its ability to interpret natural language, a key goal for human-computer interface.

Whenever such breakthroughs and demonstrations of computing power take place (like when IBM’s Deep Blue beat chess champion Gary Kasparov) this raises the question of what computers are good at, vs the human brain. Computers are great as processing information quickly, at storing and accurately retrieving information, and at running algorithms and simulations. But they still lag way behind humans in pattern recognition. The massively parallel organization of the human brain is optimized for pattern recognition – it’s something we do well. The weakness of the brain, however, is that it is error prone and unreliable.

Computers and brain are therefore complementary in their strengths and weaknesses. It therefore makes sense that the best use of computers is to take over tasks at which computers excel and humans are weak, and also to aid (but not replace) humans in tasks at which humans excel and computers are weak. But the performance of Deep Blue or Watson does not imply that computers are ready to take over from humans what humans are good at. Here is an example of exactly that misguided interpretation.

But, why not take it a step further and just get rid of the human altogether? Literally tens of thousands of people die every year because doctors are, well, only human, and make diagnostic mistakes which can later be identified as violating evidence based medicine.

The problem I always saw was how to convince people that yes, a computer can beat House, MD. Well IBM can’t go head-to-head with House but maybe beating Ken Jennings and Brad Rutter might help.

No, a computer cannot beat House, MD, nor will it be able to anytime soon. The author of that statement, Karl Smith, is correct in noting that medicine has become too complex for mortal humans to practice error-free, or even to master all relevant information. We are probably at, or even beyond, the limit of human capability. This has forced a trend toward specialization – narrowing the field of expertise so as to limit the amount of information needed to master. But specialization has its limits and downsides as well.

Clearly human doctors need help, and I agree that computers are an obvious solution. Incorporating error-reducing systems, like check lists, is another approach. Taking more of a team rather than individual approach to care is another potential solution.

I there is also a role for computer-based expert systems as an aid to the process of diagnosis and treatment. Expert systems, essentially, can represent a vast database of medical knowledge and the ability to provide the relevant information to a clinician at the point of patient care. If such a system were properly employed it can act like a tiny expert sitting on the shoulder of every physician, offering suggestions, checking their recommendations, and screening for errors.

Right now we have subsets of such systems. Doctors can have apps on their smartphones that will give them drug information on demand. Doctors frequently use Google to look up information, often right there in the room with the patient. (As an aside, patients are sometime put off by this, but they shouldn’t be. This does not mean their doctor does not know what they are doing.) And we are increasingly using electronic medical records, that can also provide a layer of computer supervision. I think such systems are currently being underutilized, and there is tremendous room for improvement. But we are heading in this direction.

However – short of truly artificial human-level intelligence, such computer systems will not replace human clinicians. People still need to gather information from their patients, and this requires a great deal of interpretation. The human element is complex and chaotic, and requires the highly sophisticated social skills of another human. Also, diagnosis is often an exercise in pattern recognition – something at which humans are still superior to computers. And there is an element of judgment involved in clinical decision-making that goes beyond any algorithm.

But humans also have weaknesses as clinicians. Sometimes that judgment is flawed, overly influenced by anecdote and recent experience (rather than the best evidence), and subject to a long list of cognitive biases.

The solution, therefore, is again to combine the best of human intelligence and computer support. We are moving slowly in this direction, but we have a long way  to go.

The Watson experience perhaps can be a good kick in the pants, to increase support for the incorporation of medical expert systems into the practice of medicine. But it also shows that such systems will not be easy to incorporate. In order to provide real-time natural language information Watson required thousands of computer cores. Perhaps the real lesson of Watson is that we are still 20 years away from widespread use of such systems. When there is a Watson app for the smartphone, every doctor will carry one into the patient room.

16 responses so far

16 thoughts on “Dr. Watson”

  1. Agreed completely on Smith’s over interpretation. It turns out IBM will try to use Watson as a medical expert system, but this is clearly some what off. (18 months sounds optimistic to me).

    Another worthy piece cautioning against reading too much into Watson’s success: here.

    And Dan Dennett has (a now very dated but still relevant) piece about the moral costs of expert systems.

  2. devongarde says:

    I think you’re being overly pessimistic on the timescales. You seem to be presuming that a ‘Dr. Watson’ (a terrible name for us microsofties) would have to be carried on a smartphone to be used from a smartphone. Rather, consider the smartphone is a communications device, and that there is already immense computing power available, relatively cheaply, in the cloud.

    An ordinary computer would takes months to crack an SHA-1 password by brute force. The Amazon cloud does it pretty much in real time, for a couple of dollars.

    And 20 minutes to crack a WiFi password

    I wouldn’t have a clue how much work it’d take to build a medical diagnosis ‘Dr Watson’ in the cloud, but I suspect that is going to be the dominant factor in determining when such software becomes available, rather than the computing power required.

  3. Draal says:

    I think Jennings made a valid point in an interview on NPR that the limiting factor for the human champions was reaction time. Watson had a slight edge in being able to buzz in faster than the meat bags. If the game was replayed with all contestants writing down their answers, Jennings predicted that the meat bags would have fared much better.

  4. superdave says:

    Any hypochondriac knows the danger in Watson being used to diagnose disease. How many diseases could Watson rattle off given the input of fatigue, joint pain, dizziness, nausea or any number of vague generally benign symptoms that can also be associated with serious diseases.

  5. tmac57 says:

    I am going to guess that just the exercise of trying to create an ever increasingly accurate expert system,will pay off in new insights into how our brains work.

  6. SARA says:

    “The human element is complex and chaotic, and requires the highly sophisticated social skills of another human.”

    To me this is the biggest hurdle to replacing a human physician. Human’s have a hard time reading each other at times. There are so many clues in reading someone’s communication that we don’t even fully understand ourselves. A huge number of them are visual.

    The humans have to understand those patterns of communication and all their variations across culture and age and experience, in order to program that into a computer.

    I don’t think we have a deep enough grasp on it to program it.

  7. lowbatteries says:

    I think you overestimate with the prediction of 20 years. I think in terms of raw computing power, you are correct, but there are compounding advances that will combine to make such decision-making devices come about much quicker.

    One of the big advances is the algorithms invented to do this sort of work. As those algorithms improve or new ones are invented, less raw processing power is needed to do the same task.

    Even so, the idea that within 15-20 years desktop systems could be asking “Dave, are you sure you don’t want to check your clients X levels to check for disease Y?” is just awesome.

  8. jesse.huebsch says:

    Where this would be an advantage would be to give doctors a ranked list of possible conditions that the doctor would then have to evaluate. The same would go for treatment options, and with electronic medical histories could even use the statisitcs of that persons conditions to help rank the possibilites. In terms of treatments it wshould be able to include all known treatment interactions like drug interactions, side effects that might aggravate other conditions, etc. Again a doctor would have to evaluate the results.

    It would help prevent missing relativly rare things.

    Long term that would help drive mediacal records, and in an appropraite aggregate statistics way data mine those records, so everyone who interacts with the medical system would end out contributing to clinical results, especially for the statistics of relative frequency of conditions vs. demographics.

    the current Watson computer could support 400 doctos at a time if it takes 3s to get an answer and a docotor asks once every 20 mins. It does not have to live in the same geographic location as the user.

  9. eean says:

    well really Watson is sort of an alternative to Google. Google already tries to give straight-up answers to some questions. so whats interesting isn’t when we’ll have watson on our desk, but when will we have it in the datacenter for free public access. What will that mean for wikipedia and research in general?

    It also show how important the book digitalization projects are for google: even if they were never allowed to give direct access, they will serve as the raw data for googles “ask anything” system.

    (And throw in copyright and we’ll never have Watson on our desk no matter the computional power.)

    @draal Jennings know more then anyone that the IBM researcher was accurate to say this is a fair advantage though. I played quiz bowl, buzzing in is half the game.

  10. eean says:

    though in high school quiz bowl you can buzz in as soon as you want. I bet watson would’ve lost then.

  11. That is a good point about decentralized computing. Such a program could be in the cloud. I could also see a large entity, like Yale Medical School, for example, buying one such computer that is then on the local network and can be accessed by everyone in the system.

  12. taustin says:

    I agree that the best use of an expert system is as an aid to a human doctor, not a replacement. So many people seem to believe the two are mutually exclusive.

    I also think that even if we get to where computers are comparable (or better) at pattern recognition, that we’ll be a long way from replacing human doctors with computers. There’s more to medical care than just diagnosing the patient. There’s also followup care – making sure it was the correct diagnosis. If a mistake is made, there’s also appropriate reaction to it. Some medical mistakes can kill very, very quickly, and doctors often have instincts that can react quickly enough even if they’re not entirely certain why they need to.

    One of the breakthroughs with Watson’s devleopment was the realization that it needed to “hear” the other contestent’s answers, correct and incorrect, as metadata for categories. The category names are often puns of some sort that Watson didn’t understand, and only hearing correct answers from other contestents could help.

  13. Min says:

    Anyone who isn’t completely blown away by Watson’s performance simply doesn’t understand the engineering and programming that went into making it.

  14. SimonW says:

    I think you mischaracterize AI strengths by saying pattern recognition is not good. Machines are excellent at the kind of pattern recognition needed for diagnosis, as has been demonstrated in he AI world previously.

    MYCIN expert system was the classic and one of the earliest examples and out performed its trainers.

    The main problem is that the research isn’t typically presented in the manner that a machine would need, and the problem domain is immense. MYCIN resolved this by tackling a very specific sub-domain, using a very structured knowledge base.

    The approach I’ve seen used in other domains is to approach data collection in a systematic approach, and extract maximal value from a limited number of diagnostic tests. i.e. Give the same small set of tests to a large population, tabulate the results against the conditions they have, and use that set of tests on everyone to make a diagnosis.

    What Watson demonstrates is the machines might be able to approach the problem in a “more human” like fashion by saying examining the medical literature in its current form.

    The skills the machine will bring will include assimilation or examination of a lot of data, if we can arrange to make that data available. I’m thinking things like genetic code, and/or familial medical records. As a doctor you probably couldn’t check the medical records of all the closest living relatives of your patients for conditions which might present with the symptoms you are seeing (not did present the same way, but that could), for a system like Watson that is a trivial task for the first few thousand relatives. It can then screen them all, check if the condition is thought to be hereditary, or perhaps due to common environmental exposure, and in a couple of seconds answer the “family history” question more thoroughly than you or the patient ever could (we all know how amazingly vague people can be about their relatives conditions, and in one recent case a relative had her conditioned “misdiagnosed” and I wasn’t aware of the “update” when speaking to my own doctor). Of course that would require changes to privacy laws, or changes to practice (“Can we use your records for statistical aid in treating your relatives?”, and probably also data from stranger to set base line probabilities for certain symptoms or disorders). Realistically getting that organized isn’t going to happen (except perhaps in Sweden and similar places), and the assumption was that the “masses of data” the machine will analyse will probably be Exome or Genomic data sets and publicly available sets of related data.

    Watson’s performance suggests it could also tabulate and use information from “softer” sources, like people with the same surname posting at online web forums.

    I’ve long thought one of the big issues with medicine is failure to capitalize, probably because it has largely been protected from the great capitalist experiment by strict regulation. When this inevitable step takes place medicine will be revolutionized, and doctors will start scoring better than their 55% hit rate at first diagnosis.

    I suspect more of the type of knowledge will be embedded in search engines, as the main way forward immediately. Google by and large has already a large proportion of that. Having out of frustration carefully constructed that Google search term describing the annoying symptoms I was experiencing at one point, and the top search result hit the nail on the head suggesting they were all due to an existing condition.

    And agree with Min, the performance was stunning. Also think the speed issue is largely down to software, these kinds of difficult problems often permit of huge optimisations once you’ve learnt how to do them the hard way. Although failing that progress in hardware will probably close the gap, and it isn’t as though the big search engines and clouds are short on hardware. Indeed the curious thing about Watson is it actually stored all the data inside itself rather than going to the Internet. Imagine if you could process PubMed at 10 papers a second, you could process the most relevant papers when needed, during a 10 minute consultation you could assimilate the 6000 most relevant papers for the condition presented, following up all those citation to see if something better is being done somewhere else, or if the result in one paper was never replicated.

  15. azinyk says:

    If a doctor sees a new patient every 5 minutes (which is improbably fast, even in an emergency department), and Watson can do its thing in 3 seconds, then a computer like Watson would be capable of supporting 100 doctors. More probably, they’d build an even bigger computer and then let people use it from all over the world.

    I bet a system like Watson will be adopted faster outside of medicine, though. The stakes are smaller, there’s fewer bureaucratic obstacles and insurance risks, and practices change faster.

  16. Winter says:

    I take issue with the idea that expert systems in healthcare are anything new. My family doctor when I was growing up has been using an expert system (http://pkc.com/) since the late eighties.

    I am discouraged that they aren’t more widespread now. But just getting computer medical records has been a struggle. To say that a computer should be involved in the patient interview is apparently a hard thing for some people.

Leave a Reply