False Identity! Can We Trust Video Evidence of Human Testimony To B...

Some time ago a Stanford University project demonstrated that they have created software that accurately maps the facial movements of one person onto another in video. This means that a source video can be used for a targetted human and the computer will convert the movements of a camera feed of a live actor onto the face and head of the target human. The result is a realistic video that appears to show the target saying the things that the actor is saying! The implications of this are HUGE and yet few seem to know anything about it.

false identity

Voice synthesis

Given that voice synthesis has been a huge and successful research area for a long time, it is safe to say that technology will exist somewhere (possibly in the massively funded Secret Intelligence Industrial Complex to use computers to accurately model the voice of any person, given enough sample material of their voice for the software to analyse.

We have probably all heard computers produce voices that sound close to 100% realistic by now - so it is not really a huge leap to imagine a system that can create 100% realistic voices that copy the voices of real people (who are not consenting to being copied).

Creating authentic fake video!

The essence here is that whoever has these systems can happily sit and create video content that claims to show anyone in the world saying anything at all and most of us will never know that the video is fake. This means that video testimony in court should be entirely nullified and that every video we see of someone saying something (controversial) online or on TV should be watched with the consideration that there is a chance that the real person never said the words we are hearing!

Does this sound too paranoid to you? Watch the video demonstration below!

Stanford Video Demo

We present a novel approach for real-time facial re-enactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Re-enactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are re-enacted in real time.

project page