How to Create an AI Generated Video with ChatGPT, Synthesia, and Descript

Learn how we created an AI generated video with a ChatGPT script, a Synthesia avatar and voice, and stock footage from Descript.

Chris Lavigne

Creative

There is a lot of buzz around new and exciting artificial intelligence (AI) and machine learning (ML) tools for video production and video creation. So, I wanted to see first-hand how some of these tools perform! As an experiment, I set out to create a high quality video using generative AI in less than 15 minutes.

So, what exactly is generative AI? Generative artificial intelligence refers to new types of machine learning algorithms that use existing content like text, audio, video, and images to create completely original content. In this case, we used AI to make a video without writing a script or picking up a camera, but that looks as though it was filmed.

In particular I was curious if, as a video producer, I would be out of a job! Let’s find out…

1. Write a script with ChatGPT

Like most of our videos at Wistia, I needed a script to get started. So, I first opened ChatGPT, the language model from Open AI. ChatGPT (generative pre-training) is an AI chatbot system that can provide information and answer questions through a conversation.

For this experiment, I used the following prompt and typed it into their system:

Write a script in the style of a YouTube video about how to make an apple pie. The video should be under 60 seconds long. The script should feel friendly in nature.

Within seconds, the system generated a pretty decent video script. It blew me away! I’ve been writing scripts for over 20 years and I’ve come to find the script as the frequent bottleneck to turning an idea into a video. What ChatGPT created (in merely seconds) was an incredible starting point for a video script.

For the case of this experiment, I copied and pasted the exact copy that was generated by ChatGPT and moved onto the next phase of this AI video production process.

2. Generate your avatar video with Synthesia

Synthesia is AI software that will animate a human-like avatar automatically. In essence, it will convert text to a talking head video, complete with an AI voice and all.

Create your AI avatar and voice

The process of creating a custom AI avatar with Synthesia was pretty straightforward. I recorded a video of myself reading a few pre-scripted lines to the camera. I sent that footage to Synthesia, and within a few days, they created my avatar and made it ready to use in the system. To create a voiceprint, I uploaded about 20 minutes of clean audio of me talking. There was no script I needed to read along to. It just needed to be relatively clean and pristine audio of my voice. Once you do this, the system is capable of reading out any text input in my voice.

All-in-one Video Platform

Create, Edit, And Host Videos

Learn more

Turn text into video

I pasted my script from ChatGPT into Synthesia, uploaded a custom background to use behind my AI talking head avatar, and adjusted the size and placement. Once that was looking good, I clicked “Generate,” and within a few minutes, Synthesia automatically animated, created human-sounding audio, and rendered out a talking head video.

Here’s what that video looked like:

Now, I know what you’re thinking. Creepy, right? While this tech is impressive, it’s far from believable. For now, that is. Does this cross the uncanny valley? No, of course not. But will this technology improve with time? Most definitely.

For me, watching an AI avatar version of myself was quite disturbing. It looks like me and kind of sounds like me, but…isn’t me. And I never said those things.

After seeing the output from Synthesia, I thought the video could be improved with some visuals and b-roll. So I downloaded the talking head video from Synthesia and moved into the next phase of production.

3. Add visuals using Descript

Descript is audio and video transcription tool that also has a suite of audio and video editing tools. It uses AI to automatically transcribe your video into text. But the feature we’re using is their included stock footage library.

I first uploaded my talking head video and let Descript convert it into text. From there, I highlighted moments from the script that I wanted to visualize, and searched their library of stock footage for clips to use.

I did this for a few other lines in the video…and then exported the final version. Here it is!

Final thoughts on AI video generation

So? Has the video producer just become extinct? Am I out of a job? The answer, for now, is…no.

But these and the many other AI tools that exist can be used this very instant to help bring your vision to life. Like with storyboarding a video. Or making a quick proof-of-concept video. Or testing out a script to make sure it’ll communicate what you’re trying to say! As the video creator, you need to know how far you can push the technology, and when to do it the old fashioned way.

So? What do you think? Is this the future of content creation? Or will it be a dystopian reality the likes of which we’ve never seen? Let me know what you think.

Chris Lavigne

Creative

Mailing list sign-up form

Sign up for Wistia’s best & freshest content.

More of a social being? We’re also on Instagram and Twitter.