A podcast from Embark Studios, creators of the upcoming FPS "The Finals," suggested that the game will use AI voice lines for the foreseeable future. However, this explanation puzzled some voice actors.
Carl Strandberg and Andreas Almström, audio designers for The Finals, were asked: "and asked (spoiler: this is not an opinion shared by everyone): 'What is the best way to make the voice line work? They replied: "With a few exceptions, we use AI, so all the voices of the contestants, including the barkers and commentators, are AI speech synthesis. Other narration, such as growls, pain sounds, and sounds of jumping over objects, are done in-house.
"We chose this route because AI voice synthesis has finally become very powerful, and we wanted to make sure that our game designers were able to use it in the best possible way."
[7 When a game designer comes up with a new idea for a game mode, he or she can create a narration to represent it in just a few hours, instead of the many months it would take.""We have a lot more flexibility in the way we do voice synthesis.
However, that description does not match the experience of the voice actors who actually work on the games. One such actor is Gianni Matragrano. He may be best known for his role as Gabriel in "Ultra Kill," but he has worked on a variety of games, including "Phantom of the Phantom," "Gloomwood," and "Evil West."
Matragrano wrote in a Twitter thread: "We are always hitting rush order sessions like within a day or two ...... If you need more, you can book another session. It's very easy. He further revealed that he had his suspicions when he played the beta but waited for confirmation: "I had my suspicions, but I didn't want to say anything in case I was wrong, or at least it was just a placeholder. But in a massive open beta with [150,000] simultaneous players, this is definitely just their vision."
The video above is an example posted by Matragrano himself... Yeah, not so great; you've heard too much of the uncanny valley to believe the "very powerful" technology that Strandberg and Almström were touting. They added the caveat that "even if the sound is a little strange, it blends well aesthetically with the fantasy of a virtual game show." Whether these voice lines immerse you or not is up to you.
Another voice actor, Zane Schacht, wrote 'Why do AI voice people act like hiring a voice actor is some arcane ritual ...... I once knocked out an entire game's worth of voices in a two-hour session. It's not deep."
Meanwhile, Pax Helgesen, senior sound designer and voice actor himself, commented: "I would again encourage developers to reconsider the use of voice in games as just another 'asset' in the agile development pipeline." He goes on to say that, yes, AI can play an important role in game development, but it is "an actor that can collaborate with the tools of its own skill and experience to create something greater than what the developer imagined."
I would agree here. In some ways, acting and sound design are two very different disciplines. This is similar to how "AI artists" get shot down in public when they share the results of their prompts.
While it is true that you can ask an algorithm to produce something, art involves dozens of purposeful choices that a machine cannot reproduce at this time. Acting is similar; I wonder if Strandberg and Almström just don't know enough about VAs to understand how their ElevenLabs-generated dialogue is jarring to players who don't care about development turnaround time, and I think.
What makes this even stranger is that there are already interesting and thoughtful uses of this technology in games. Not long ago, it was revealed that the dubbed version of Cyberpunk 2077 used AI to provide new dialogue for the game's expansion pack, Phantom Liberty, after a character's voice actor died. CD Projekt took due care. They hired a voice actor to provide new dialogue (which is changed by Respeecher), obtained the consent of the voice actor's estate, and did so in order to preserve the original, non-AI acting.
As for The Finals, I am struggling to find creative intent. Sure, the AI may provide quick development - if not as slow as the developers say - but the result is a lack of personality. There's certainly no need to provide a deep narrative in a multiplayer shooter, but you'll be listening to this barking for hours on end. The stiff, stilted speech style quickly becomes irritating.
When I reached out to Embark Studios for comment, the studio told me via email that they "use both recorded and TTS (text to speech) tool-generated audio in the game depending on the context. Using TTS allows us to do tailor-made [voice acting] that would not otherwise be possible due to implementation speed issues, etc."
"When we use TTS in finals, it is always based on real voices. The point here is that most AI voice programs are based on real voices, just as AI art is based on real art. In the Open Beta, it's based on a mix of professional voice actors and temporary voices from Embark employees. Creating a game without actors is not an end goal for Embark.
Embark Studios did not comment on the question of "months versus weeks," but there seems to be a suggestion close to the aforementioned interview: the TTS is part of the vision for the Finals. Unless public opinion changes Embark Studios' mind, the game will likely use a mixture of voice work and AI after the beta is over.
.
Comments