Six things you must check BEFORE you upload your AI generated video to LipDub AI.
- Are your files supported?
- Lip Diversity
- Video Length
- Audio Length
- Number of faces visible in the video
- Understand the difference between Single-Actor Projects versus Multi-Actor Projects.
1. Are your files supported by LipDub AI?
If you have your own dub audio that you recorded or that you created using audio platforms such as Elevenlabs then make sure your audio files are supported by click on below:
If you want to create your own audio on Lipdub AI using text to speech you can! But it is only available for Single-Actor Projects. (click below)
2. Does your video show the actor talking?
No, my actor has a closed mouth for the entire video.
- Next Steps - Please regenerate your AI video but include in your prompt something similar to “make the actor talk” so their lips move naturally. (doesn’t matter if you use Kling or Runway or another video generation tool, the actor on screen should have their mouths moving to ensure good results)
- Why? - LipDub studies how the actor’s lips look like in each visme. LipDub AI then creates a finetuned model for said actor. This finetuning process allows LipDub to create seamless and world-class quality. If the video only shows the actor with a closed mouth then the result will be poor articulation. (see below)
This user uploaded a video with this actor not moving their lips. The result turned out poor since the original video did not have the actor moving their lips.
Yes my actor on screen is moving their mouth
- Next Steps: Please continue reading.
3. Is your video at least 30 seconds long?
No, my video is only 5 seconds long.
- Next Steps: please loop the video so it is at least 30 seconds long.
- Why? - LipDub performs better when there is at least 30 seconds to 1min of footage of the actor moving their mouths and making different vismes. Since you have a AI generated video and most results are limited to 10 seconds long, one workaround is to loop the video to extend it.
- There are diminishing returns when finetuning the AI model so after 5min+ of data, no need to provide any more than 5min.
Yes, my video is at least 30 seconds long.
- Next Steps: Please continue reading.
4. Is your audio that you’d like to use for lipsync longer than your video?
Yes, my dub audio is longer than the length of my original video.
- Next Steps: Please loop your AI video such that it is longer or equal to your dub audio.
- Why? - LipDub only modifies the lip area. It does not create new video frames. So if your dub audio is longer than your video, then we will cut the audio short and give you back the video.
- Example: If my dub audio is 2min long and my original video is 30 seconds long. Then LipDub will only output 30 second long video.
Here’s a rough idea of where the mask is that LipDub AI pastes back on top of your original video.
.png?table=block&id=19763dbb-5363-8022-89be-f9cab8bd760f&cache=v2)
No, my dub audio is shorter than my video or I don’t know the length of my audio because i want to create it using the Text-to-Speech feature on LipDub AI.
- Next Steps: Please continue reading.
5. How many faces are visible in your video?
Only 1 face is visible for my entire video.
- Next Steps: Please use Single-Actor Project type. (click below)
This an example of a single actor video. No other faces are visible in frame.
This video is lip-syncing the girl in the background. The user selected Single-Actor project type. But because there are other faces visible, Single-Actor Project types does not know which actor to lip-sync correctly, so it may lip-sync the incorrect person. This videos is considered a multi-actor video. (See next steps below)
Multiple faces are visible in my video
- Next Steps: Please use Multi-Actor Project type. (click below)
6. What is the difference between Single-Actor versus Multi-Actor projects?
Single Actor - video guide
- Single Actor videos. (where only 1 person is visible in the whole video)
Pros | Cons |
Easy to use - Everything is automatic. Simply upload your video and click generate | Lack of control - because everything is automatic, the network may detect a face in the background and it may lip-sync the incorrect person. |
Audio features - automatic translation, text-to-speech, SRT upload. | Can only upload ONE video per project - If you want to lip-sync a 2nd video with the same actor in the same lighting, you must create a new Single-Actor Projects and train a AI model again for that 2nd video. |
Can generate multiple times using the same trained model - this means you will not be charged for training a model for each subsequent result of the same video. | |
Match shorter audio files - you have a 1min video and then only upload a 10 second audio that you want to use for lipsync, then the output video will be 10 seconds long. (LipDub assumes you don't need the remaining 50 seconds of silence) |
Multi-Actor - video guide
- Multi Actor videos. (where multiple people’s faces are visible)
Pros | Cons |
More control - users can select each face detection that they want to lip-sync. | No audio features - automatic translation, text-to-speech, and SRT upload are not yet available. User must upload their own dub audio that they want to use for lip-sync. |
Can upload MULTIPLE videos - If you want to lip-sync five videos with the same actor in the same lighting, you can simply upload all videos to one Advanced flow project. | More prone to user error - because it is more in-depth and requires more user input, the chances of user error goes up. |
Can generate multiple times using the same trained model - this means you will not be charged for training a model for each subsequent result of the same video. | Will not match shorter audio files - You have a 1min video and then only upload a 10 second audio that you want to use for lipsync, then the output video will be 1 min long.
Why? - This is common in multi-speaker videos. One person speaks for the first 10 seconds but then is silent the remaining of the video, then another person starts to speak later in the video. So LipDub does not assume that it can cut the remaining 50 seconds, unlike Single-Actor flow |