reproducibilityindex.ai

Flow-Based Unconstrained Lip to Speech Generation

Authors: Jinzheng He, Zhou Zhao, Yi Ren, Jinglin Liu, Baoxing Huai, Nicholas Yuan843-851

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the superiority of our proposed method through objective and subjective evaluation on Lip2Wav-Chemistry Lectures and Lip2Wav-Chess-Analysis datasets.
Researcher Affiliation	Collaboration	1Zhejiang University, China 2Huawei Cloud
Pseudocode	No	The paper describes the model architecture and components but does not provide pseudocode or algorithm blocks.
Open Source Code	No	The paper provides a link to a demo video at https://glowlts.github.io/, but no explicit statement or link for open-source code for the methodology.
Open Datasets	Yes	In this paper, we focus on more challenging unconstrained, real-world settings and conduct experiments on Lip2Wav-Chemistry-Lectures and Lip2Wav-Chess-Analysis datasets proposed in Prajwal et al. (2020), which are the currently largest datasets for unconstrained settings.
Dataset Splits	No	The paper mentions using Lip2Wav-Chemistry-Lectures and Lip2Wav-Chess-Analysis datasets but does not explicitly provide training, validation, and test split percentages or sample counts.
Hardware Specification	Yes	All measurements are conducted with 1 NVIDIA 2080Ti GPU.
Software Dependencies	No	The paper states "Our implementation is based on Py Torch" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We use 4 feed-forward Transformer blocks with 2 attention heads and a dropout of 0.1 in our condition module. For our flow-based decoder, we use 12 flow blocks in the training and inference process. Each flow block includes 1 actnorm layer, 1 invertible 1x1 conv layer, and 4 affine coupling layers. We optimize our model using Adam (Kingma and Ba 2014) optimizer with an initial learning rate of 2 * 10^-4 and weight decay of 1 * 10^-6 in both stages. It takes about 200k steps for the first stage of training and about 100k steps for the second stage.