Artificial intelligence assistants like Siri and Alexa hear more than you think Source: Will Oremus, Slate
It was a weeknight, after dinner, and the baby was in bed. My wife and I were alone — we thought — discussing the sorts of things you might discuss with your spouse and no one else. (Specifically, we were critiquing a friend's taste in romantic partners.) I was mid-sentence when, without warning, another woman's voice piped in from the next room. We froze.
"I HELD THE DOOR OPEN FOR A CLOWN THE OTHER DAY," the woman said in a loud, slow monotone. It took us a moment to realize that her voice was emanating from the black speaker on the kitchen table. We stared slack-jawed as she — it — continued: "I THOUGHT IT WAS A NICE JESTER."
"What. The. Heck. Was. That?" I said after a moment of stunned silence. Alexa, the voice assistant whose digital spirit animates the Amazon Echo, did not reply. She — it — responds only when called by name. Or so we had believed.
We pieced together what must have transpired. Somehow, Alexa's speech recognition software had mistakenly picked the word Alexa out of something we said, then chosen a phrase like "tell me a joke" as its best approximation of whatever words immediately followed. Through some confluence of human programming and algorithmic randomization, it chose a lame jester/gesture pun as its response.
In retrospect, the disruption was more humorous than sinister. But it was also a slightly unsettling reminder that Amazon's hit device works by listening to everything you say, all the time. And that, for all Alexa's human trappings — the name, the voice, the conversational interface — it's no more sentient than any other app or website.
But the Echo's inadvertent intrusion into an intimate conversation is also a harbinger of a more fundamental shift in the relationship between human and machine. Alexa — and Siri and Cortana and all of the other virtual assistants that now populate our computers, phones and living rooms — are just beginning to insinuate themselves, sometimes stealthily, sometimes overtly, and sometimes a tad creepily, into the rhythms of our daily lives.
As they grow smarter and more capable, they will routinely surprise us by making our lives easier, and we'll steadily become more reliant on them. Even as many of us continue to treat these bots as toys and novelties, they are on their way to becoming our primary gateways to all sorts of goods, services and information, both public and personal.
When that happens, the Echo won't just be a cylinder in your kitchen that sometimes tells bad jokes. Alexa and virtual agents like it will be the prisms through which we interact with the online world. It's a job to which they will necessarily bring a set of biases and priorities, some subtler than others. Some of those biases and priorities will reflect our own. Others, almost certainly, will not. Those vested interests might help to explain why they seem so eager to become our friends.
Learning to talk
In the beginning, computers spoke only computer language, and a human seeking to interact with one was compelled to do the same. First came punch cards, then typed commands such as run, print and dir. The 1980s brought the mouse click and the graphical user interface to the masses; the 2000s, touch screens; the 2010s, gesture control and voice. It has all been leading, gradually and imperceptibly, to a world in which we no longer have to speak computer language, because computers will speak human language — not perfectly, but well enough to get by.
We aren't there yet. But we're closer than most people realize. And the implications — many of them exciting, some of them ominous — will be tremendous.
Like card catalogs and AOL-style portals before it, Web search will begin to fade, and with it the dominance of browsers and search engines. Mobile apps as we know them — icons on a home screen that you tap to open — will start to do the same. In their place will rise an array of virtual assistants, bots and software agents that act more and more like people: not only answering our queries, but acting as our proxies, accomplishing tasks for us and asking questions of us in return.
This is already beginning to happen. As of this month, all five of the world's dominant technology companies — Apple, Amazon, Google, Microsoft and Facebook — are vying to be the Google of the conversation age. Whoever wins has a chance to get to know us more intimately than any company or machine has before — and to exert even more influence over our choices, purchases and reading habits than they already do.
And once we perceive a virtual assistant as human, or at least humanoid, it becomes an entity with which we can establish humanlike relations. We can like it, banter with it, even turn to it for companionship when we're lonely. When it errs or betrays us, we can get angry with it and, ultimately, forgive it. What's most important, from the perspective of the companies behind this technology, is that we trust it. Should we?
Technology and us
If a revolution in technology has made intelligent virtual assistants possible, what has made them inevitable is a revolution in our relationship to technology. Computers began as tools of business and research, designed to automate tasks such as math and information retrieval. Today they're tools of personal communication, connecting us to information but also to one another.
Amazon's Alexa may not be as versatile as Apple's Siri yet, but it turned out a sense of purpose and its own limitations. Whereas Apple implicitly invites iPhone users to ask Siri anything, Amazon ships the Echo with a cheat sheet of basic queries it knows how to respond to: "Alexa, what's the weather?" "Alexa, set a timer for 45 minutes." "Alexa, what's in the news?"
The cheat sheet's effect is to lower expectations to a level that even a relatively simplistic artificial intelligence can plausibly meet on a regular basis. At launch, the Echo had just 12 core capabilities. That list has grown steadily as the company has augmented Alexa's intelligence and added integrations with new services.
As delightful as it can seem, the Echo's magic comes with some unusual downsides. In order to respond every time you say "Alexa," it has to be listening for the word at all times. Amazon says it only stores the commands that you say after you've said the word Alexa and discards the rest. Even if you trust Amazon to rigorously protect and delete all of your personal conversations from its servers — as it promises it will if you ask it to — Alexa's anthropomorphic characteristics make it hard to shake the occasional sense that it's eavesdropping on you, Big Brother-style.
I was alone in my kitchen one day, unabashedly belting out the Fats Domino song Blueberry Hill as I did the dishes, when it struck me that I wasn't alone after all. Alexa was listening — not judging, surely, but listening all the same. Sheepishly, I stopped singing.
Whom do we trust?
The notion that the Echo is "creepy" or "spying on us" might be the most common criticism of the device. But there's a more fundamental problem. It's one that is likely to haunt voice assistants, and those who rely on them, as the technology evolves and bores its way more deeply into our lives.
The problem is that conversational interfaces don't lend themselves to the sort of open flow of information we've become accustomed to in the Google era. By necessity they limit our choices — because their function is to make choices on our behalf.
For example, a search for "news" on the Web will turn up a diverse and virtually endless array of possible sources, from Fox News to Yahoo News to CNN to Google News, which is itself a compendium of stories from other outlets. But ask the Echo, "What's in the news?" and by default it responds by serving up a clip of NPR News' latest hourly update, which it pulls from the streaming radio service TuneIn. Which is great — unless you don't happen to like NPR's approach to the news, or you prefer a streaming radio service other than TuneIn. You can change those defaults somewhere in the bowels of the Alexa app, but Alexa never volunteers that information. Amazon has made the choice for you.
Imagine for a moment what it would sound like to read a whole Google search results page aloud, and you'll understand no one builds a voice interface that way. That's why voice assistants tend to answer your question by drawing from a single source of their own choosing (in Alexa's case, it's often Wikipedia, which it literally doesn't say out loud). The sin here is not merely academic. By not consistently citing the sources of its answers, Alexa makes it difficult to evaluate their credibility. It also implicitly turns Alexa into an information source in its own right, rather than a guide to information sources, because the only entity in which we can place our trust or distrust is Alexa itself. That's a problem if its information source turns out to be wrong.
Just say 'hello'
If there's a consolation for those concerned that intelligent assistants are going to take over the world, it's this: They really aren't all that intelligent. Not yet, anyway. Nonetheless, problems of transparency, privacy, objectivity and trust are resurfacing in fresh and urgent forms. A world of conversational machines is one in which we treat software like humans, letting them deeper into our lives. It's one in which the world's largest corporations know more about us, hold greater influence over our choices, and make more decisions for us than ever before. And it all starts with a friendly "Hello."
| }
|