Editor’s Note: Henry Wei, MD, is a board-certified internist and a
Clinical Instructor in Medicine at Weill-Cornell Medical College. He is
currently a Senior Medical Director at Aetna, where he leads Clinical
Research & Development for ActiveHealth Management. (The views and opinions
expressed here do not necessarily reflect those of Dr. Wei’s employers.)

By now you’ve probably heard plenty about Siri. Or perhaps Siri has heard
plenty about you. Siri, the speech recognition software Apple’s built into the
iPhone 4S, has been billed as an agile virtual assistant, able to perform
tasks ranging from taking dictation and sending the outcome as an email to
finding facts and scheduling appointments. But here’s something that hasn’t
gotten quite as much popular press: Siri is really good at medical terminology.
And swear words (more on that later).

I unboxed my own iPhone 4S after a recent clinic week. With a backlog of
patient notes, I became curious about dictating these cases–leaving out patient
names for reasons I’ll get into later. After watching some Will Ferrell
(who doesn’t?), I tested out the Siri feature using some very precise
medical terms–language I wouldn’t expect anyone but a physician to be using in
real life. But Siri got every word right off the bat, without any ambiguity–a
task that can be challenging for most humans, let alone a computerized

Impressed, I began testing Siri from the other perspective: the patient’s. What
if I say I have chest pain, or I’m depressed? Parsing through Siri’s replies,
which mostly had to do with finding the nearest hospital, I noticed that
these answers were coming from something more than just speech
recognition–actual meaning was being recognized. That was exciting to me; it
showed a layer of design and consideration beyond the literal. You see, my
day job involves developing clinical decision support and innovating
healthcare IT. When I see a system that’s strong enough with semantic meaning
to, say, take an order at the drive-up window of a fast food joint, I get

Siri’s technology isn’t unique to Apple, which apparently has partnered with
Nuance, the company with the near-monopoly on spech recognition. Elsewhere,
Google Voice and others have made great strides with similar services.

Regardless, this feels magical. That is, Steve Jobs seems to have taken Arthur
Clarke’s law
to heart: “Any sufficiently advanced technology is
indistinguishable from magic.” Siri is a consumer-level tool that starts
working right away–no instruction manual needed–and though most of us have no
idea how, it can match speech and meaning with appropriate tasks. When it works
well, it enables us to skip an entire step in the process of getting things
done. We can move from having an idea to being ready to put it in motion
without needing to search or transcribe.

Imagine, for example, if Siri or a similar service evolves to the point where a
doctor can just say out loud, “please order a CBC with differential, chemistry
panel and a lipid panel,” and the program were smart enough–and had enough data
on you and the patient–to interpret your meaning and remind you that you’d
recently ordered a CBC for this case, and you forgot to order a liver function
test. Having that kind of rapid work-check would be an incredibly valuable
safety and efficiency feature.

What if, even further down the line, Siri (or a device like it) were able to
sit as a fly on the wall, listening to the doctor-patient conversation to a
point where it could act as a second expert opinion for diagnoses. Today,
physicians practice primarily in a vacuum, with little oversight. It’s too
expensive and outright embarassing to have someone observe and coach you, as a
physician. Atul Gawande wrote recently on coaching and the experience of
being coached as a surgeon, noting that “we treat guidance for professionals as
a luxury.” But think if you had a technology that could be watching what
physicians did in real time with the goal of offering real coaching. For
instance, a continuous speech recognition system could notice specific syntax
and nuances of the patient’s wording that might indicate a higher risk of
medication non-adherence, and offering an appropriate strategy to overcome that
risk. This type of computerized guidance is–paradoxically–somewhat offensive to
me as a physician, but incredibly exciting to me as a patient.

Clearly, with further development, something like Siri can be game-changing for
medicine. But here, a caveat. Only if doctors really want it. Change always
seems to come with two elements. One is a profound need, but the other–and this
is the one we in healthcare frequently miss–is an actual desire to change. That
said, the more expert and more successful the workers in any given field
become, the less they’ll actually want any change.

More broadly, we should think of Daniel Pink, who speaks to the keys behind
motivation as Autonomy, Mastery, and Purpose. Doctors have for generations
performed medicine in a very autonomous manner; it’s one of the things we’re
good at. How good are we at relinquishing control, or even accepting external

Still, something like Siri has a leg up because it’s intuitive. Our own brains
spend a good deal of time decoding words and sounds, so speech-based
communication is more intuitive to us than a keyboard or even a touchscreen.
This type of user interface should bear a sense of satisfaction when we notice
the job is being done right. This intuitive component is something we just
don’t get from a lot of healthcare technology today, which suffers from the
“clicky-clicky problem,” a term that my wife (also a physician) coined when
describing the irritating way that most EMRs force doctors into drop-down
menus, radio buttons, and dozens of text entry boxes–death by clicks.

While I don’t foresee an immediate conversion to Siri-like decision support
services among the majority of physicians anytime soon — our desire for
autonomy is too strong–I do believe that its ability to slip into iPhone users’
everyday patterns will translate to longer-term adoption. I’d see its course
going something like this: We begin by using it in our personal lives.
Insidiously, it works its way into the administrative sphere of offices and
hospitals–not the doctors at first. From there, it makes its way into the
operating room, where, in an environment not dissimilar from driving a car,
we’d prefer not to have to type or touch anything to express our ideas,
queries, and commands. Finally it becomes truly pervasive not just among
early adopters
, but also those older doctors–particularly those used to
dictating already–who are so adept at what they’re doing that the effort of
change seems unnecessary.

One of our rate-limited steps may be the interoperability aspect. Nobody wants
a technology that functions in a vacuum; if you take a dictation, you’re going
to need it to be stored somewhere, and its semantic meaning interoperable with
other systems. We’ll need apps, and those take time to take from a concept in
someone’s head to actual living, breathing code and user interfaces. That said,
the SMART platform championed by Zak Kohane and Ken Mandl at Harvard, and
the iNexx platform from Medicity on the private side, are both very
promising at establishing an easier way to interoperable “apps” in the
healthcare environment.

Some people, like the good Dr. Alexander Blau at Doximity, have raised
important concerns about security and potential HIPAA violations with Siri.
It’s my sense that the patent for the technology suggests that the data are
either encrypted or have the option of being encrypted. Either way, I suspect
that Apple or Nuance will be asked to make this part a little more transparent
by healthcare IT companies interested in levering their technology–particularly
if they’re using any data as training data for the system. For the time being,
though, the possibility of HIPAA-insecurity well worth keeping in mind.
Longer-term, we’ll have to see if the FDA will start playing a larger
in regulating medical technologies that employ speech recognition.

Oh, the part about swear words? Well, believe it or not, back in medical
school, my roommates and I invented a game where we would call and swear at
toll-free automated voice response systems (e.g. MovieFone and airlines) to see
what the system would do. The smart systems would actually direct you to a live
human operator right away. Well, Siri actually recognizes a lot of obscenities,
and even has some pretty humorous responses. And at the end of the day, if
speech recognition technology in medicine starts to be able to pick up on the
emotional intent of users–not just the semantic meaning of, say, medical terms
or swear words–then my bet is that we’ll have entered a truly magical era of
Healthcare IT.

Back to Blog