reproducibilityindex.ai

Latent Image Animator: Learning to Animate Images via Latent Space Navigation

Authors: Yaohui Wang, Di Yang, Francois Bremond, Antitza Dantcheva

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive quantitative and qualitative analysis suggests that our model systematically and signiﬁcantly outperforms state-of-art methods on Vox Celeb, Taichi and TED-talk datasets w.r.t. generated quality.
Researcher Affiliation	Academia	Inria, Universit e Cˆote d Azur {yaohui.wang,di.yang,francois.bremond,antitza.dantcheva}@inria.fr
Pseudocode	No	The paper includes architectural diagrams but does not contain any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Source code and pre-trained models are publicly available1. 1https://wyhsirius.github.io/LIA-project/
Open Datasets	Yes	Our model is trained on the datasets Vox Celeb, Taichi HD and TED-talk. We follow the pre-processing method in (Siarohin et al., 2019) to crop frames into 256 256 resolution for quantitative evaluation. ... Vox Celeb (Nagrani et al., 2019)... Tai Chi HD (Siarohin et al., 2019)... TED-talk is a new dataset proposed in MRAA (Siarohin et al., 2021).
Dataset Splits	No	The paper specifies training and test sets for datasets (e.g., 'Vox Celeb contains a training set of 17928 videos and a test set of 495 videos.') but does not explicitly provide details about a separate validation set split or its size/percentage.
Hardware Specification	Yes	All models are trained on four 16G NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions 'Our model is implemented in Py Torch (Paszke et al., 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies, such as CUDA or other libraries.
Experiment Setup	Yes	The total batch size is 32 with 8 images per GPU. We use a learning rate of 0.002 to train our model with the Adam optimizer (Kingma & Ba, 2014). The dimension of all latent codes, as well as directions in Dm is set to be 512. In our loss function, we use λ = 10 in order to penalize more on the perceptual loss.