reproducibilityindex.ai

GAIA: Zero-shot Talking Avatar Generation

Authors: Tianyu He, Junliang Guo, Runyi Yu, Yuchi Wang, jialiang zhu, Kaikai An, Leyi Li, Xu Tan, Chunyu Wang, Han Hu, HsiangTao Wu, sheng zhao, Jiang Bian

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTS Benefitting from the disentanglement between motion and appearance, GAIA enables two common scenarios: the video-driven generation which aims to generate results with the appearance from a reference image and the motion from a driving video, and the speech-driven generation where the motion is predicted from a speech clip. The video-driven generation evaluates the VAE, while the speech-driven one evaluates the whole GAIA system. We compare GAIA with state-of-the-art methods for the two scenarios in Sec. 5.2, and further make detailed analyses in Sec. 5.3 to understand the model better.
Researcher Affiliation	Industry	Microsoft {tianyuhe,junliangguo,v-runyiyu,v-yuchiwang,xuta}@microsoft.com
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper provides a URL (https://microsoft.github.io/GAIA) which is a project page, not a direct link to a source-code repository. It does not explicitly state that source code is available at this link or elsewhere.
Open Datasets	Yes	For high-quality public datasets, we collect High Definition Talking Face Dataset (HDTF) (Zhang et al., 2021) and Casual Conversation datasets v1&v2 (CC v1&v2) (Hazirbas et al., 2021; Porgali et al., 2023)
Dataset Splits	Yes	We train our model on the union of the datasets described in Sec. 3, and we randomly sample 100 videos from them as the validation set.
Hardware Specification	Yes	For both the VAE and the diffusion model, we adopt Adam (Kingma & Ba, 2015) optimizer and train our models on 16 V100 GPUs.
Software Dependencies	No	The paper mentions several tools and models (e.g., 'wav2vec 2.0', 'Adam optimizer', 'Conformer', 'CLIP', '3DDFA', 'dlib') but does not provide specific version numbers for any software dependencies or libraries needed for reproducibility.
Experiment Setup	Yes	The learning rate is set to 4.5 e 6 and keeps constant during training. [...] The learning rate starts from 1.0 e 4 and follows the inverse square root schedule. [...] We use the resolution of 256 256 for all the settings.