- Most people trust what they watch — but that won’t always be the case.
- Tech is being developed that will make it easy to create fake video footage of public figures or audio of their voice.
- The developments aren’t perfect yet, but they threaten to turbo-charge “fake news” and boost hoaxes online.
- In years to come, people will need to be far more skeptical about the media they see.
LONDON — Late last year, some WikiLeaks supporters were growing concerned: What had happened to Julian Assange?
The then-45-year-old founder of the anti-secrecy publisher was no stranger to controversy. Since 2012, he has sheltered in the Ecuadorian Embassy in Knightsbridge, London, following allegations of sexual assault. (He denies them, and argues the case against him is politically motivated.) But the publication of leaked emails from Democratic Party officials in the run-up to the US presidential election saw Assange wield unprecedented influence while at the centre of a global media firestorm.
After the election, though, suspicions were growing that something had happened to him. Worried supporters highlighted his lack of public appearances since October, and produced exhaustive timelines detailing his activities and apparent “disappearance.” They combined their efforts to solve the mystery together, on the Reddit community r/rWhereIsAssange.
Video interviews and photos of Assange were closely scrutinised amid speculation that they might have been modified with computer-generated imagery (CGI) — or faked entirely, as at least one YouTube analysis alleged.
“We need to look at the many glitches in that interview, and there were many for sure. Either terrible editing went on or CGI or whatever was just not fluid enough to make the grade. We need to understand why Assange’s head looked like a cut and paste to his suit,” one amateur sleuth wrote on Reddit.
Another investigator took an alternative approach: “I plan on watching the interview totally sober, and then vaping a whole bunch of weed and re-watching. I find that I can spot CGI or irregularities incredibly easily when I am really high.”
This is not normal behaviour. When watching newsreel, or a clip of an interview on Facebook, most people don’t give much thought as to whether the footage is real. They don’t closely scrutinise it for evidence of elaborate CGI forgery.
But these concerns may not be confined to the paranoid fringes of the internet forever.
CGI and artificial intelligence (AI) are developing at a rapid pace, and in the coming years, it will become ever-more easy for hoaxsters and propagandists to create fake audio and video — creating the potential for unprecedented doubt over the authenticity of visual media.
“The output we see from these models … are still crude and easily identified as forgeries, but it seems to be only a matter of refinement for them to become harder to discern as such,” Francis Tseng, co-publisher of The New Inquiry and a curator of a project tracking how technology can distort reality, told Business Insider.
“So we’ll see the quality go up, and like with other technologies, the costs will go down and the technology will become accessible to more people.”
Early tech demos are a sign of what is to come
We’re already living in an era of “fake news.” President Trump frequently lashes out online at the “phony” news media. Hoax outlets are created by Macedonian teenagers to make a quick buck from ad revenue, and their stories go massively viral on platforms like Facebook. Public trust in the media has fallen to an all-time low.
But a string of tech demos and apps highlight how this problem seems likely to get much worse.
Earlier in July, University of Washington researchers made headlines when they used AI to produce a fake video of President Obama speaking, built by analysing tens of hours of footage of his past speeches. In this demo, called “Synthesizing Obama,” the fake Obama’s lips were synched to audio from another of his speeches — but it could have come from anywhere.
In a similar demo from 2016, “Face2face,” researchers were able to take existing video footage of high-profile political figures including George W. Bush, Vladimir Putin, and Donald Trump and make their facial expressions mimic those of a human actor, all in real time.
Even your voice isn’t safe. Lyrebird is voice-mimicking software that can take audio of someone speaking and use it to synthesise a digital version of their voice — something it showed off to disconcerting effect with demos of Hillary Clinton, Obama, and Trump promoting it. It’s currently in development, and Adobe — the company behind Photoshop — is also developing similar tools, under the name “Project Voco.”
And once you start to combine these technologies, things get really interesting — or worrying. Someone could synthesise a speech from President Trump using Lyrebird, then make a fake version of him generated with “Synthesising Obama”-style software deliver it.
You can quite literally put words into the mouth of any public figure.
It could undermine trust in everything you watch
Developers of this technology are awake to the dangerous possibilities of this tech. “Making these kinds of video manipulation tools widely available will have strong social implications. That is also the reason why we do not make our software or source code publically available,” Justus Thies, who helped to develop Face2face, told Business Insider.
“[Imagine] kids having access to such a software — they would lift cyberbullying to a whole new level. You can also assume that the number of fake news will increase.”
Supasorn Suwajanakorn, a researcher on “Synthesising Obama,” agreed that it could be used to produce fraudulent material — but argues it could also lead to more skepticism among ordinary people: “It could potentially be used to create fake videos when combined with technology that can generate a person-specific voice. On the other hand, if such tools are widespread and well-known, people can be more cautious about treating video as a strong evidence. People know Photoshop exists, and no one simply believes photos. This could happen with videos.”
This was echoed by Yaroslav Goncharov, CEO of photo-editing app FaceApp. People will just have to learn to stop taking videos at face value, he argued: “If ordinary people can create such content themselves, I hope it will make people pay more attention to verifying any information they consume. Right now, a lot of heavily modified/fake content is produced and it goes under the radar.”
He added: “Before printers were available, people could assign much high credibility to printed materials than to handwritten ones. Now when most people have a printer at home, they won’t believe in something just because it is printed.”
There’s a flipside to the fact that it will become easy to make photo-realistic fraudulent video: It will also cast some doubts on even legitimate footage. If a politician or celebrity is caught saying or doing something untoward, there will be an increasing chance they decide to argue the entire video is fabricated “fake news”.
In October 2016, President Trump’s presidential campaign was rocked by the “Access Hollywood” tape — audio of him discussing groping women, including the now-infamous line: “Grab them by the pussy.” What if he could have semi-credibly claimed the entire thing was just an AI-powered forgery?
It’s not all bad, however: Just think of the entertainment!
So should conscientious developers swear off this technology altogether? Not so fast — there are also numerous positive use-cases, from entertainment to video gaming.
Face2face suggested its techniques could be used in post-production in the film industry, or for creating realistic avatars for gaming. In the announcement of “Synthesising Obama,” it is suggested that it could be used to reduce bandwidth during video chats and teleconferencing. (Don’t bother streaming video — just send audio and synthesise the visuals instead!) Products like Lyrebird and Project Voco, meanwhile, could help people with speech disorders synthesise fluent and realistic speech on demand.
And Tseng also posits the tech could be used to “foster a wide culture of DIY entertainment: people editing clips from movies but replacing the dialogue or other elements in scenes or entirely synthesizing new clips by emulating actors and actresses.”
But, he warns, developers still have a responsibility to take political issues into account. “Software development as a profession has grown so rapidly through so many informal channels that there is not much of a professional culture of ethics to speak of. Other engineering professions have developed pretty robust ethical standards, and those hold up because engineers trained in those professions go through a limited number of formal channels which expose them to those ethics. The boon of programming education is its decentralization and wide accessibility, but this also means people often pick up the skills without the necessary ethical frameworks to accompany them.”
He added: “Anyone involved in the development of technology, directly or indirectly, has a responsibility to consider these issues, outright refuse to implement problematic technologies, or subvert them in some way.”
The entertainment industry, of course, has long used CGI for entertainment purposes — and it is acutely aware of what further developments could herald. In December 2016, “Star Wars: Rogue One” came out, featuring a surprise appearance from actor Peter Cushing.
It was a particularly surprising appearance because Cushing had been dead for 22 years. His image was reconstructed using CGI overlaid on a real actor.
It wasn’t a perfect recreation, but the stunt grabbed headlines, and spooked some other celebrities. Reuters reported at the time that its release led to actors “scrambling to exert control over how their characters and images are portrayed in the hereafter,” negotiating contracts on how their image may or may not be used even after they die.
In January 2017, Lucasfilm even had to deny that it was planning to incorporate a CGI Carrie Fisher into the upcoming movie “Star Wars: The Last Jedi” after rumours that the studio was planning to get around the actress’ death in December 2016 by making a digital version of her.
It’s time to start getting ready
It’s undeniable that developments in the coming years will heighten challenges people will face in finding and responsibly sharing media. In trying to solve these new challenges, everyone — journalists, developers, tech platforms, and ordinary people — all may have a role to play.
Technology already exists to cryptographically sign footage captured by a camera, so it can be verified when required. News outlets and organisations could perhaps one day “sign” their footage, so anyone can check its authenticity. No matter how convincing the fake, if it’s not cryptographically fingerprinted, viewers would know something was wrong.
Face2face suggests its findings could be built upon to help “detect inconsistencies” in media and help identify fraudulent imagery.
Thies argued that big tech platforms like Facebook will have a duty to proactively police for fraudulent media: “Social media companies as well as the classical media companies have the responsibility to develop and setup fraud detection systems to prevent spreading / shearing of misinformation.”
And as Goncharov and others suggested, it may force ordinary people to be more skeptical, and not take video and audio at face value — much like they wouldn’t with a photo or screenshot today.
In January 2017, Julian Assange read out a hash from the bitcoin blockchain (essentially a high-tech version of holding up today’s newspaper) on a public livestream in a bid to prove he was still alive, and that the video hadn’t been pre-recorded.
A decade from now, if recreating real-time imagery of public figures (or anyone else!) becomes trivial, such authentication may no longer be enough.