When it comes to deepfake videos, computer scientists can now detect manipulated facial expressions with greater accuracy than ever before. The achievement heralds a new era in the development of automated tools designed to detect manipulated videos.
Significance of the work
In experiments on two challenging data sets, the new recognition technology accurately pin-pointed 99% of manipulated expressions in video clips.
Over the past few years, developments in the deepfake world have made it relatively easy to swap one talking head for another; or to swap one person’s genuine facial expressions for fake facial expressions. But, up until this point, a limited number of methods existed for detecting the latter. For this reason, this new technological development by University of California Riverside researchers is considered notable.
Identity swaps vs. facial expressions
Prior to this point, researchers had created tools capable of detecting deepfake identity swaps with relative accuracy. For example, tools could generally determine the authenticity of a video that featured your organization’s chief executive (yes, that is the real chief executive talking or no, that is not the real chief executive talking). But tools had a tougher time discerning whether a genuine video of your organization’s chief executive (or whomever) had been manipulated to show inaccurate facial expressions.
While the detection of inaccurate facial expressions may seem trivial, consider the power of facial expressions in person-to-person communications. Facial expressions communicate emotions, intentions, and even action requests. Smiles vs. scowls can suggest entirely different preferred sets of business deliverables or desired outcomes.
UC Riverside method
How does the UC Riverside method of deepfake detection work? It divides the task into two components within a deep neural network. The first component observes facial expressions and sends information about the regions that contain the expression -such as the eyes, nose or mouth- into a second component of the system; known as the encoder-decoder. The encoder-decoder architecture maintains responsibility for manipulation detection and localization.
The aforementioned framework, known as Expression Manipulation Detection (EMD), can both detect and localize the specific areas within an image that has been manipulated. In other words, it can create ‘heat maps’ of specific areas of the face that was subjected to video manipulation.
Experimental analyses reveal that the Expression Manipulation Detection methodology averages better performance than other tools in the detection of facial expression manipulations and in detection of identity swaps. According to UC Riverside researchers, EMD accurately detected 99% of manipulated videos, indicating a significant breakthrough in the detection of manipulated content.
The detection of genuine or falsified emotional expressions is useful in a variety of disciplines, including image processing, cyber security, robotics, psychological studies and virtual reality development.
Learn more about deepfake detection
For more about the researchers’ work, see the paper entitled, “Detection and Localization of Facial Expression Manipulations,” which was presented at the 2022 Winter Conference on Applications of Computer vision. Or learn more about deepfakes in CyberTalk.org’s interview with the CEO of Cyabra, Dan Brahmy.
Lastly, to receive more cutting-edge cyber security news, best practices and analyses, please sign up for the CyberTalk.org newsletter.