Image-to-image translation for cross-domain disentanglement

Authors: Abel Gonzalez-Garcia, Joost van de Weijer, Yoshua Bengio

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets. We demonstrate the disentanglement properties of our method on variations on the MNIST dataset [26], and apply it to bidirectional multi-modal image translation in more complex datasets [3, 38], achieving better results than state-of-the-art methods [23, 50] due to the finer control and generality granted by our disentangled representation. Additionally, we outperform [50] in cross-domain retrieval on realistic datasets [23, 45].
Researcher Affiliation Academia Abel Gonzalez-Garcia Computer Vision Center agonzalez@cvc.uab.es Joost van de Weijer Computer Vision Center Universitat Autònoma de Barcelona Yoshua Bengio MILA Université de Montréal
Pseudocode No The paper describes the model architecture and training process in text and diagrams (Figure 2) but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and models are publicly available at https://github.com/agonzgarc/cross-domain-disen.
Open Datasets Yes We demonstrate the disentanglement properties of our method on variations on the MNIST dataset [26], and apply it to bidirectional multi-modal image translation in more complex datasets [3, 38]... cross-domain retrieval on realistic datasets [23, 45].
Dataset Splits No We use the standard splits for train (50K images) and test (10K images). (MNIST) We set 5 random cars for test and train with the remaining 796 images. (3D car models) We arrange X and Y as before, and train the model with 5,372 images from 1,343 chairs, leaving 50 chairs for test. (3D chair models) The paper explicitly states training and testing splits but does not mention a distinct validation set split.
Hardware Specification No The paper describes the experimental evaluation but does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the implementation, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup No We detail the architectures and hyperparameters used for all experiments in the supplementary material. Concretely, we use an 8-dimensional noise vector z sampled from N(0, I). We have found out that adding small noise (N(0, 0.1)) to the output of the encoder as in [46] prevents this from happening and leads to better results. We train our model jointly in an end-to-end manner, minimizing the following total loss L =w GAN(LX GAN + LY GAN) + w Ex(LGX d GAN + L F Y d GAN) + w L1(LS + LX auto + LY auto + LX recon + LY recon). While some settings are mentioned, the paper explicitly states that detailed architectures and hyperparameters are in supplementary material, implying a comprehensive setup is not provided in the main text.