Latent Image Animator: Learning to Animate Images via Latent Space Navigation
Authors: Yaohui Wang, Di Yang, Francois Bremond, Antitza Dantcheva
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive quantitative and qualitative analysis suggests that our model systematically and significantly outperforms state-of-art methods on Vox Celeb, Taichi and TED-talk datasets w.r.t. generated quality. |
| Researcher Affiliation | Academia | Inria, Universit e Cˆote d Azur {yaohui.wang,di.yang,francois.bremond,antitza.dantcheva}@inria.fr |
| Pseudocode | No | The paper includes architectural diagrams but does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code and pre-trained models are publicly available1. 1https://wyhsirius.github.io/LIA-project/ |
| Open Datasets | Yes | Our model is trained on the datasets Vox Celeb, Taichi HD and TED-talk. We follow the pre-processing method in (Siarohin et al., 2019) to crop frames into 256 256 resolution for quantitative evaluation. ... Vox Celeb (Nagrani et al., 2019)... Tai Chi HD (Siarohin et al., 2019)... TED-talk is a new dataset proposed in MRAA (Siarohin et al., 2021). |
| Dataset Splits | No | The paper specifies training and test sets for datasets (e.g., 'Vox Celeb contains a training set of 17928 videos and a test set of 495 videos.') but does not explicitly provide details about a separate validation set split or its size/percentage. |
| Hardware Specification | Yes | All models are trained on four 16G NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions 'Our model is implemented in Py Torch (Paszke et al., 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies, such as CUDA or other libraries. |
| Experiment Setup | Yes | The total batch size is 32 with 8 images per GPU. We use a learning rate of 0.002 to train our model with the Adam optimizer (Kingma & Ba, 2014). The dimension of all latent codes, as well as directions in Dm is set to be 512. In our loss function, we use λ = 10 in order to penalize more on the perceptual loss. |