green and black audio mixer

Unlocking the Power of Visual Storytelling: Transforming Videos through Text Editing


The Future of Video Editing: Rewrite Videos By Editing Text

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. The last few years have been an amazing ride when it comes to research works for creating facial reenactments for real characters. Beyond just transferring our gestures to a video footage of an existing talking head,

Controlling their gestures like video game characters and full-body movement transfer are also a possibility. With WaveNet and its many variants, we can also learn someone’s way of speaking, write a piece of text and make an audio waveform where we can impersonate them using their own voice.

So, what else is there to do in this domain? Are we done? No-no, not at all! Hold on to your papers, because with this amazing new technique, what we can do is look at the transcript of a talking head video, remove parts of it or add to it, just as we would edit any piece of text – and, this technique produces both the audio and a matching video of this person uttering these words. Check this out.

How It Works

It works by looking through the video collecting small sounds that can be used to piece together this new word that we’ve added to the transcript. The authors demonstrate this by adding the word “fox” to the transcript. This can be pieced together by the “v” which appears in the word “viper”, and taking “ox” as a part of another word found in the footage. As a result, one can make the character say “fox” even without hearing her uttering this word before. Then, we can look for not only the audio occurrences for these sounds, but the video footage of how they are being said, and in the paper, a technique is proposed to blend these video assets together.

User Study Results

Finally, we can provide all this information to a neural renderer that synthesizes a smooth video of this talking head. A user study conducted showed that the edited videos were often confused with the real ones, indicating the effectiveness of this new technique. The ability to edit video transcripts opens up new possibilities for digital storytelling.

The bar is getting lower, making it easier to produce these kinds of videos while making it harder to distinguish real from edited footage. Ethical considerations are also important to consider when using these techniques.

Thank you for watching and stay tuned for more exciting updates in the world of AI-powered video editing. The future of video manipulation is here!

Similar Posts


  1. Nice, Humanity created internet, video recordings and voice recordings, to make information transmission easier.
    In a few years information transmission wont be reliable thanks to something Humanity invented 😂

  2. Does this mean that we're going to stop getting youtube videos where it randomly cuts to the dude wearing different clothing in a different room going "hey guys so while I was editing the video…"

Leave a Reply

Your email address will not be published. Required fields are marked *