Opinion | Hank Azaria’s ‘Simpsons’ Voices Won’t Be Fully Replicated by A.I.


undefined

There’s a human behind
our favorite
animated characters.
But as A.I. comes to Hollywood,
that could change.

I imagine that soon enough, artificial intelligence will be able to recreate the sounds of the more than 100 voices I created for characters on “The Simpsons” over almost four decades. It makes me sad to think about it. Not to mention, it seems just plain wrong to steal my likeness or sound — or anyone else’s.

In my case, A.I. could have access to 36 years of Moe, the permanently disgruntled bartender. He’s appeared in just about every episode of “The Simpsons.” He’s been terrified, in love, hit in the head and, most often, in a state of bitter hatred. I’ve laughed as Moe in dozens of ways by now. I’ve probably sighed as Moe 100 times. In terms of training A.I., that’s a lot to work with.

But a voice is not just a sound. And I’d like to think that no matter how much an A.I. version of Moe or Snake or Chief Wiggum will sound like my voice, something will still be missing — the humanness. There’s so much of who I am that goes into creating a voice. How can the computer conjure all that?

A misconception about voice acting is that it takes only a voice. But our bodies and souls are involved to get the proper believability. When I first watched Dan Castellaneta, who plays Homer, and Harry Shearer, who plays Mr. Burns and many other characters, doing vocal recordings, I was almost embarrassed by how silly they looked. They were jumping around and giving a full performance to no one — just a microphone. I was 23. It took me a while to get up the courage to do that, too.

It can be as simple as running in place if your character’s running. If your character is crying, you work up real tears, real emotion. A lot of my characters have thrown punches or been punched in the face. If your character’s talking while he throws a punch, it’s hard to fake unless you actually throw a punch. Sometimes we’ll pick up a prop if it helps us get into the reality of the scene. I played a character who was cigar-chomping, so I stuck a highlighter in my mouth while I talked.

It has always been interesting over the years to watch major movie stars and wonderful actors who had not done much voice work come in to record with us. They wouldn’t know at first that they couldn’t do it just from the neck up. Once they realized that, they were brilliant. I remember Mandy Patinkin and Anne Bancroft coming in and figuring it out. Mick Jagger’s not shy onstage, but he had to take that journey. He eventually got that you have to fully commit, as you would to any performance.

Another thing we do on “The Simpsons” is improvise. When you play around with the dialogue, there are interruptions and a natural back-and-forth — you’re not just reciting a line-by-line thing. It’s hard to imagine a computer being able to mimic that rhythm.

Over the years, I’ve created the voices of Comic Book Guy, Professor Frink, Cletus the Slack-Jawed Yokel, the Sea Captain and Superintendent Chalmers, to name a few. They’ve been created in all kinds of ways — imitations of celebrities, of friends, of family members.

When I got to audition for “The Simpsons,” I went in and did a young Al Pacino impression. At the time I was playing a drug dealer in a play and talking like a young Al Pacino in the role. When I did it for my “Simpsons” audition, I was told, “We like that voice, but we want you to make it gravelly.” You take my version of a young Al Pacino and you add gravel to it, and you get Moe the bartender.

Chief Wiggum is really just an imitation of Mel Blanc doing his kind of exaggerated impression of Edward G. Robinson. I grew up listening to that. One of the most gratifying things about being a Simpson is it seems to mean as much to the kids who grew up with the show as Mel Blanc and Bugs Bunny did to me, providing a similar kind of comfort and humor in their childhood that stays with them. Can A.I. do that for people?

Anyone who’s a mimic or does vocal impressions is already sort of a weird version of A.I. — you store these voices, have deep recall of them and can recreate them. But for Chief Wiggum, I’m not doing a straight imitation of Edward G. Robinson, which the computer could well be capable of. I’m doing a weird impersonation of an impersonation.

undefined

For Chief Wiggum, I take Mel Blanc
imitating Edward G. Robinson
in an old Warner Bros. cartoon, then
I make it even whinier.

The voice I created when I played Agador, the shoe-challenged butler in the 1996 film “The Birdcage,” came from my memories from when I was a kid. I had two voices I was deciding between for the character. One was tougher, like the Puerto Ricans I grew up listening to in my neighborhood in Queens. The other one sounded like my maternal grandmother.

In my family, we were Sephardic Jews in a Spanish-English bilingual household. My grandmother spoke five languages, and she had a Hispanic accent when she spoke English. She was also very loving and sweet and feminine, which is what I ended up basing both my voice and character on. I’m not the most macho guy in the world, but my character was very mothering to the other characters in the film. I didn’t relate to that, so I started imagining what my grandmother would do, and it all clicked for me. So it wasn’t just sounding like her; it was her mentality and her affection that went into creating Agador’s voice.

If A.I. tries to recreate one of my voices, what will the lack of humanness sound like? How big will the difference be? I honestly don’t know, but I think it will be enough, at least in the near term, that we’ll notice something is off, in the same way that we notice something’s amiss in a subpar film or TV show. When the exposition is clunky or there’s a bad bit of dialogue or a character says something that’s out of character — why would he say that if he was afraid? Why did she just announce her back story like that? Et cetera.

It adds up to a sense that what we’re watching isn’t real, and you don’t need to pay attention to it. Believability is earned through craftsmanship, with good storytelling and good performances, good cinematography and good directing and a good script and good music.

undefined

For Snake, I take Sean Penn’s Jeff Spicoli
voice from “Fast Times at Ridgemont High”
and make it a little deeper.

An A.I.-generated voice has enough little things askew to make you think there’s something missing. It just isn’t compelling or funny, in the same way that A.I.-generated faces in video seem to be missing elements that would make them believable and human-seeming — too often micro-expressions and gestures are not quite right.

Or it might depend on the episode. Great writers don’t hit it out of the park every time. They give you great scripts, medium scripts, not-OK scripts. Maybe that will be the case for A.I., too. I also recognize that in our distracted era, it’s possible that people might not catch on to the difference.

There may be some aspects of a performance that A.I. can enhance. When I know that a certain line needs a laugh, but I’m not sure how to get one, I’ll try different things. I’ll make a list of eight or nine ways to try it. I’ll do a mad take, a glad take, a sad take, a deadpan take, one that’s aggressive, one that’s really in my feelings. It’s hard to tell which one’s going to work, but you can always tell in editing.

The A.I. model may not know what’s funny or what timing is, but it could do a million different takes. And it could be told to do them as I would — and it might be pretty convincing.

So, if I’m being honest, I am a little worried. This is my job. This is what I love to do, and I don’t want to have to stop doing it. The conventional wisdom in Hollywood is that the technology for making faces seem fully human is five years away. I fear that the voice equivalent is also coming.

If A.I. takes over, maybe there could be some upside. I miss dearly Mel Blanc’s old Bugs Bunny performances. We’ll never get them again. But maybe with A.I., we can have more of them. Maybe it would work especially well if someone like me, who is intimately familiar with the subtleties of the character, could help recreate what Bugs Bunny was doing by essentially directing A.I.

I think we’ll still need someone who in his mind and heart and soul knows what needs to be done. A.I. can make the sound, but it will still need people to make the performance. Will the computer ever understand emotion on its own, what’s moving and what’s funny? Now we’re getting into science fiction – because for that, I think, the A.I. would have to be alive.

“A Tale of Two Cities,”
read by Hank Azaria as

Moe the bartender, Moe the bartender,

Chief Wiggum, Chief Wiggum,

Snake Jailbird, Snake Jailbird,

Cletus the Slack-Jawed Yokel, Cletus the Slack-Jawed Yokel,

Professor Frink, Professor Frink,

Superintendent Chalmers, Superintendent Chalmers,

Comic Book Guy, Comic Book Guy,

the Sea Captain, the Sea Captain,

Duffman. Duffman.

Hank Azaria has won multiple Emmy Awards for his work on “The Simpsons.” He recently appeared in the HBO show “The Idol.”

Interactive credits

Produced by Jonah M. Kessel, Susannah Meadows, Derek Arthur, Frank Augugliaro, Shannon Lin, Sam Whitney and James Robinson. Cinematography by Elliot deBruyn, Jan Kobal and Jonah M. Kessel. Video editing by Jonah M. Kessel and Emily Holzknecht.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *