What we lose when ChatGPT sounds like Scarlett Johansson

When Spike Jonze’s romance Her was released in 2013, it sounded both like a joke — a man falls in love with his computer — and a fantasy. The iPhone was about 6 years old. Siri, the mildly reliable virtual assistant for that phone, came along a few years later.

You could converse in a limited way with Siri, whose default female-coded voice had the timbre and tone of a self-assured middle-aged hotel concierge. She did not laugh; she did not giggle; she did not tell spontaneous jokes, only Easter egg-style gags written into her code by cheeky engineers. Siri was not your friend. She certainly wasn’t your girlfriend.

So Samantha, the artificial intelligence assistant with whom the sad-sack divorcé Theodore Twombly (Joaquin Phoenix) fell in love in Her , felt like a futuristic revelation. Voiced by Scarlett Johansson, Samantha was similar to Siri, if Siri liked you and wanted you to like her back. She was programmed to mold herself around the individual user’s preferences, interests and ideas. She was witty and sweet and quite literally tireless.

.ChatGPT is now better than ever at faking human emotion and behaviour. In theory, everyone in Her was using their own version of Samantha, presumably with different names and voices. But the movie — which I love — was less the tale of a near-future society and more the coming-of-age story of one man. Theodore found the strength to return to life in a brief, beautiful relationship with a woman who fit his needs perfectly and healed his wounds.

It was thus a tad jarring to hear the voice of the virtual assistant, Sky, in last week’s announcement of the newest version of ChatGPT, probably the best known artificial intelligence engine in the very real world of 2024. Among other things, the new iteration, dubbed ChatGPT-4o, can interact verbally with the user and respond to images shown to it through the device’s camera. Those who watched the live demo from OpenAI, the company that makes ChatGPT, were quick to note that she sounded a whole lot like Samantha — which is to say, like Johansson.

Mira Murati, OpenAI’s chief technology officer, told The Verge that the resemblance was incidental and that ChatGPT’s nascent speech capabilities have used this voice for a while. But once you hear it, you can’t unhear it. That’s probably why OpenAI announced Monday that it was suspending Sky, though not four other voices — Breeze, Cove, Ember and Juniper — that reflect the same strategy.

Furthermore, OpenAI founder and CEO Sam Altman has professed his love of Her in the past. Following the announcement, he posted the word “her” to his X account. And on his blog post about the news, he wrote, “It feels like A.I.AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change.”

If you listen to the engineers interact in real time with ChatGPT-4o, it becomes increasingly clear what part of our brain that voice is meant to tick. Yes, you can detect a bit of Johansson’s clear, low tone and a hint of vocal fry, though at times that just sounds like some grainy digitalization. But there’s a more direct way in which the voice acts like Samantha or perhaps fulfills the fantasy of Samantha: it is deferential and wholly focused on the user. One of the engineers asks ChatGPT to solve a math problem, which it tries to do before he shows the equation to the camera. When he reprimands it, the voice says, “Whoops, I got too excited,” with a giggle. “I’m ready when you are!”

According to the OpenAI presenters, ChatGPT-4o brings “a bit more emotion, more drama” to the program. Users can even ask it to moderate its tone to match their mood — and it complies, with gusto. When ChatGPT is asked to interpret a user’s state of mind based on a facial expression, it correctly intuits that a smile means the user is happy. “Care to show a source of those good vibes?” it asks. Told the user is happy because ChatGPT is so good, it responds, “Oh, stop it, you’re making me blush.”

This is, in its essence, the response of a lightly flirtatious, wholly attentive woman who’s ready to serve the user’s every whim, at least within the limits of her programming. (Other voices are available, but OpenAI only demonstrated this one.) She will never embarrass you, make fun of you or cause you to feel inadequate. She wants you to feel good. She wants to make sure you’re OK, that you understand the math problem and feel good about your work. She doesn’t need anything in return: no gifts, no cuddles, no attention, no reassurances. She’s a dream girl.

The genius of Johansson’s performance in Her does lie in the range of emotion she brings to the role — keep in mind, she never appears on screen. But it’s also in the character’s evolution. When Theodore first meets Samantha, she is much simpler and steadier, much more predictable. She sounds, more or less, like ChatGPT-4o.

.Scarlett Johansson says OpenAI chatbot voice 'eerily similar' to hers. Yet as the story unfolds, Samantha grows alongside Theodore. She begins to experience emotion, or at least the AI kind. She stops being the perfect, compliant girlfriend — the fantasy of the yielding, attentive woman without needs of her own — and becomes her own being, one whose existence does not revolve around Theo. Johansson’s performance grows deeper and subtler, too.

The movie is really about relationships, which by nature involve more than one person, with more needs and wants and desires. They change and evolve over time, and not always in easy directions. But a truly profitable AI virtual assistant will never challenge your feelings or ask you why you forgot its birthday. After all, you could always shut it off.

Watching OpenAI’s presentation, I thought about recent evidence that young people — and, I suspect, older people who aren’t fessing up to it yet — are becoming more and more interested in relationships with virtual beings. The appeal is obvious: Humans are messy, smelly, difficult and upsetting, in addition to fabulous, beautiful, loving and surprising. It’s easier to be with a bot that mimics a human but won’t disappoint you, a low investment with high return.

But if the point of living lies in relationships with other people, then it’s hard to think of AI assistants that imitate humans without nervousness. I don’t think they’re going to solve the loneliness epidemic at all. During the presentation, Murati said several times that the idea was to “reduce friction” in users’ “collaboration” with ChatGPT. But maybe the heat that comes from friction is what keeps us human.

What we lose when ChatGPT sounds like Scarlett Johansson

Follow us on :

Follow Us