Intro - What to Avoid
If your video has any of the below examples, we want you to be cautious of the end result. LipDub will produce a result even if your footage contains everything listed below.
If you have any questions join our community discord! Check out our link article to learn how to join. But if you prefer to email we’re happy to help so don’t hesitate to reach out at support@lipdub.ai.

Object Interference
Object Interference is when an object (hand, microphone, etc.) comes into the LipDub mask area.
LipDub can generate a result despite interferences in front of the face, but the end result might create strange visual artifacts.
Full Interference vs Partial Interference

Full Interference

Partial Interference
Example of LipDub result with Interference
Interference but the object is stable & consistent:
Interference but the object is unstable & not-consistent:
Side Profile
Side profile shots are more difficult for Lipdub AI to lip sync.
However, we recently made some significant jumps in improvement for results with side pose!
Example:
Graphics on Screen
This only applies to graphics that fall within the face mask region.
Lipdub will have difficulty re-generating the text perfectly. It is recommended you apply any graphics to the video after LipDub has been applied.

Visual FX & Transitions (blurs, fades, zoom ins)
LipDub will dub the face, even when there are transition effects. This can create strange looking results, as LipDub will not match the effect perfectly.
Beards
Lipdub is quite good at handling small beards! But it should be noted that high-frequency details on the face are always harder to handle, especially during extreme close ups camera positions.
If possible, beards should be avoided.

Extreme Camera Angles
If possible, try avoiding camera angles like the one’s show below.
These camera angles are fairly uncommon, and as a result LipDub may have a more difficult time perfectly pasting back their mask area of their mouth compared to a straight to camera position.
Camera position: Bottom up

Camera position: Top down

Higher than 8-bit depth videos
For example: 10, 12, 16, 32-bit depth
Extreme Close ups
When the face is so close to the camera where only the mouth is visible. LipDub will be unable to detect a face, and therefore will not Lip-sync this actor.

Different Colored Footage
E.g. Additional footage for training is color graded blue tint but the clip to Lip-sync at the very top of this screenshot is ungraded.
Example of Lipdub result when color varies in the data:
Low-light Footage
When there is very little light in a scene, it is challenging for LipDub to identify the face in the dark. This is make it difficult or impossible in some cases to lip-sync the face.
.png?table=block&id=3ff886a1-299b-46d4-bdc3-c044e8e1c327&cache=v2)
Faces that are too large in frame
LipDub performs its work on a 1024px box.
When faces are larger than this 1024px box LipDub will have to down-sample and then up-sample back to the larger pixel range.
This may cause artifacts to appear on the final render result and we recommend if possible to keep the actors face within this 1024px box.
Example: This is 4K video with a extreme close up face.
STEP 1: Identify face on screen
.png?table=block&id=e678aeba-4cc1-4b78-a37b-f6f01a27cb09&cache=v2)
STEP 2: Down-sample
.png?table=block&id=2847d354-ae7f-499e-8354-2e7ec01831d4&cache=v2)
STEP 3: Up-sample
.png?table=block&id=5f89cc73-2292-4b57-9e4c-e534e58557e1&cache=v2)