DINO: A Conditional Energy-Based GAN for Domain Translation

Authors: Konstantinos Vougioukas, Stavros Petridis, Maja Pantic

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the DINO framework on image-to-image translation since this the most typical application for domain-translation systems. Additionally, we tackle the problem of video-driven speech reconstruction, which involves synthesising intelligible speech from silent video. In all of the experiments focus is placed not only on evaluating the quality of the generated samples but also verifying that the semantics are preserved after translation.
Researcher Affiliation Academia Konstantinos Vougioukas, Stavros Petridis & Maja Pantic Department of Computing, Imperial College London, UK
Pseudocode No The paper describes the framework and equations, but it does not include a clearly labeled pseudocode block or algorithm steps formatted as such.
Open Source Code Yes Source code: https://github.com/Dino Man/DINO
Open Datasets Yes We evaluate the DINO framework on image-to-image translation... on the Celeb AMask-HQ (Lee et al., 2020) and the Cityscapes (Cordts et al., 2016) datasets... Experiments are performed on the GRID dataset (Cooke et al., 2006).
Dataset Splits Yes We evaluate the DINO framework on image-to-image translation... using their recommended training-test splits. The data is split according to Vougioukas et al. (2019) so that the test set contains unseen speakers and phrases.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions tools and optimizers like 'Adam optimizer (Kingma & Ba, 2015)' and 'Weights and Biases' but does not specify version numbers for programming languages, libraries (e.g., PyTorch, TensorFlow), or other key software dependencies required for replication.
Experiment Setup Yes The balance parameter γ is set to 0.8 for image-to-image translation experiments. We train using the Adam optimizer (Kingma & Ba, 2015), with a learning rate of 0.0002, and momentum parameters β1 = 0.5, β2 = 0.999. An Adam optimiser is used with a learning rate of 0.0001 for the video-to-audio network and a learning rate of 0.001 for the audio-to-video network. The balancing parameter γ is set to 0.5.