Deepfakes: AI systems reliably expose manipulated audio and video

Artificial Intelligence (AI) brings a wealth of new opportunities, such as improved healthcare, more efficient energy consumption and products with a longer service life. However, it also entails new risks, one of which is the creation of “deepfakes”. Whereas “fake news” involves the intentional creation of untrue written reports in social networks to skew public opinion, “deepfakes” are deceptively real but manipulated video and audio recordings that can be created only through the use of AI. The risks and challenges associated with deepfakes are considerable — not only for the media landscape but also for companies and individuals. Luckily, AI also offers a way to reliably expose deepfakes.

AI-based systems can learn to imitate a voice or a persons’ body language.
Neural networks can also be trained to identify manipulated media content as being fake.

In the past, it was almost impossible to produce high-quality fakes of video or audio material. The dynamic content presents the challenge of consistently faking at least 16,000 data points every second. However, the use of AI methods now makes this process almost child’s play. Open-source versions of the software required to produce convincing fakes automatically are freely available online.

How exactly is it done? As with other comparable machine learning models, deepfake systems are trained using data acquired online. Architectures like Tacotron and Wav2Lip (Wang, Yuxuan, et al.Shen, Jonathan, et al.Prajwal, K. R., et al.) enable the construction of neural networks that combine any sentence spoken by a target person with a corresponding facial expression and that person’s typical intonation. It is precisely these neural networks to which the “deep” in “deepfakes” refers. Around 30 minutes of suitable audio or video material is all that is required. 

Deepfakes entail new risks 

The risks associated with deepfakes are considerable. In theory, any one of us runs the risk of transactions or contracts being concluded in our name online using a faked voice or videos — as long as sufficient audio or video material is available. Companies can also suffer damage if employees are tricked into fraudulent behavior using faked audio messages. Precisely this happened to an energy company based in Great Britain, when its CEO transferred a six-figure sum of money, seemingly at the bidding of the chairperson of its German parent company — but in reality using a voice cloned by a machine (Forbes).

For the media landscape, the ability to manipulate statements made by politicians and influential decision-makers presents a particular challenge. There is often a wealth of audio and video content available for public figures like this, which provides sufficient AI training material for the creation of deepfakes. As a result, virtually any statement can be placed in the mouths of high-ranking politicians, using footage that both looks and sounds authentic.

Beating deepfakes at their own game

Although AI makes deepfakes possible in the first place, it can also be a key tool in exposing manipulated audio and video materials. And it is here where the Fraunhofer Institute for Applied and Integrated Security AISEC comes into play. IT experts in the Cognitive Security Technologies (CST) research department are hard at work creating systems for the reliable, automated recognition of deepfakes. They are also investigating methods to improve the robustness of systems that evaluate video and audio material.

To ensure that the researchers at Fraunhofer AISEC fully understand the technology behind the fakes, can identify any potential vulnerabilities and are able to develop protective measures, they first use simulations to step into the shoes of the fakers. In these simulations, they generate convincing faked audio and video data, based on which they can subsequently develop algorithms for identifying actual deepfakes. The use of AI is essential. In the same way that neural networks can learn how to create media content, they can also be trained on how to detect faked materials. This is done by feeding them a series of genuine and manipulated recordings, based on which the networks learn how to identify minute discrepancies that are not evident to the human eye or ear. The AI algorithms are then able to make automated decisions about whether an audio or video file is genuine or fake.

Although users can improve at identifying deepfakes with practice, they cannot achieve the success rate of AI security systems.

The cybersecurity experts at Fraunhofer AISEC also perform detailed security checks on the AI systems used in areas like facial recognition and speech processing. Using penetration tests, they analyze vulnerabilities and develop “hardened” security solutions that are able to withstand attempts at deception using deepfakes. By using approaches such as “robust learning” and “adversarial learning,” Fraunhofer AISEC provides the AI algorithms with an extra layer of armor that increases their resilience — through more complex program design, for example.

Use case in the insurance industry: Deepfakes deceive voice ID system

Banks, insurance companies and mobile operators are increasingly enabling callers to identify themselves using their voice. This makes a caller’s voice print as important as their password. Although voice-based identification may be more convenient than conventional authentication methods like a PIN or password, the voice ID system needs to be both robust and secure in order to be used as a trustworthy and reliable alternative.

However, as the latest use case from the scientists at Fraunhofer AISEC shows, there is still work to be done in terms of security: As part of a penetration test, they successfully breached the voice ID system of a major German insurance company. Using training material in the form of a ten-minute recording of a public speech given by the target person, the team at Fraunhofer AISEC was able to create a high-quality audio deepfake that deceived the security system and provided access to the target’s personal account.

Building expertise and awareness for detecting and defending against deepfakes

In the future, deepfakes will become ever easier to produce. It is therefore important to build up expertise, create tools and take measures to detect the use of AI in data creation and thus enable fakes to be recognized as such.

The experts at Fraunhofer AISEC already have methods that are significantly more accurate than humans in detecting manipulated media. It is also recommended to label deepfakes as such and regulate their use. The creation of corresponding legal frameworks that impose penalties for the concealed use of AI could be helpful in this regard. In parallel, the traceability of unmodified information should also be strengthened. Alongside exposing deepfakes, this traceability would enable unmodified original recordings to be reliably labeled as such.

Security systems can also be tested for their susceptibility to this type of deception and protected accordingly. The deepfakes created by researchers at Fraunhofer AISEC are used to perform comprehensive penetration testing of voice ID and other security systems to identify vulnerabilities ahead of any potential attacker.

Finally, given the importance of AI-based systems, it is necessary to raise risk awareness. Users must learn to scrutinize media materials more and check them for authenticity — whether speeches given by politicians or unexpected telephone calls from a manager asking them to transfer money to an unknown account. Studies conducted by security researchers indicate that users become better at recognizing deepfakes the more aware they are of this type of manipulation. The deepfakes created at Fraunhofer AISEC are therefore also used for training and educational purposes.

Deepfakes in practice

The solutions developed at Fraunhofer AISEC support companies and public sector institutions in the detection of video and audio deepfakes. Security checks performed using internally developed deepfakes can also identify and eliminate security vulnerabilities in systems at an early stage. The demonstrator deepfakes are also used to raise user awareness of this topic, provide knowledge of how to handle deepfakes and to train users how they can evaluate whether media content is genuine.


Can you spot the Audio Deepfake?

You versus the machine: Who is better at detecting manipulated audio?


Deepfake Total

Deepfake Total uses AI to detect audio deepfakes. Files and YouTube videos can be verified for authenticity using different audio spoof and deepfake detection models.


Read publications by our scientists to gain an overview of current research results in the area of deepfakes.



Harder or Different? Understanding Generalization of Audio Deepfake Detection

von Nicolas M. Müller, Nicolas M. Müller, Nicholas Evans, Hemlata Tak, Philip Sperl und Konstantin Böttinger



MLAAD: The Multi-Language Audio Anti-Spoofing Dataset

by Nicolas M. Müller, Piotr Kawa, Wei Herng Choong, Edresson Casanova, Eren Gölge, Thorsten Müller, Piotr Syga, Philip Sperl and Konstantin Böttinger



Human Perception of Audio Deepfakes

by Nicolas M. Müller, Karla Markert and Konstantin Böttinger



Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?

by Nicolas M. Müller, Franziska Dieckmann, Pavel Czempin, Roman U. Canals, Konstantin Böttinger and Jennifer Williams



Does Audio Deepfake Detection Generalize?

by Nicolas M. Müller, Pavel Czempin, Franziska Dieckmann, Adam Froghyar and Konstantin Böttinger



Attacker Attribution of Aufio Deepfakes

by Nicolas M. Müller, Franziska Dieckmann and Jennifer Williams