Flow-Based Unconstrained Lip to Speech Generation

Authors: Jinzheng He, Zhou Zhao, Yi Ren, Jinglin Liu, Baoxing Huai, Nicholas Yuan843-851

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the superiority of our proposed method through objective and subjective evaluation on Lip2Wav-Chemistry Lectures and Lip2Wav-Chess-Analysis datasets.
Researcher Affiliation Collaboration 1Zhejiang University, China 2Huawei Cloud
Pseudocode No The paper describes the model architecture and components but does not provide pseudocode or algorithm blocks.
Open Source Code No The paper provides a link to a demo video at https://glowlts.github.io/, but no explicit statement or link for open-source code for the methodology.
Open Datasets Yes In this paper, we focus on more challenging unconstrained, real-world settings and conduct experiments on Lip2Wav-Chemistry-Lectures and Lip2Wav-Chess-Analysis datasets proposed in Prajwal et al. (2020), which are the currently largest datasets for unconstrained settings.
Dataset Splits No The paper mentions using Lip2Wav-Chemistry-Lectures and Lip2Wav-Chess-Analysis datasets but does not explicitly provide training, validation, and test split percentages or sample counts.
Hardware Specification Yes All measurements are conducted with 1 NVIDIA 2080Ti GPU.
Software Dependencies No The paper states "Our implementation is based on Py Torch" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We use 4 feed-forward Transformer blocks with 2 attention heads and a dropout of 0.1 in our condition module. For our flow-based decoder, we use 12 flow blocks in the training and inference process. Each flow block includes 1 actnorm layer, 1 invertible 1x1 conv layer, and 4 affine coupling layers. We optimize our model using Adam (Kingma and Ba 2014) optimizer with an initial learning rate of 2 * 10^-4 and weight decay of 1 * 10^-6 in both stages. It takes about 200k steps for the first stage of training and about 100k steps for the second stage.