Image-to-image translation for cross-domain disentanglement
Authors: Abel Gonzalez-Garcia, Joost van de Weijer, Yoshua Bengio
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets. We demonstrate the disentanglement properties of our method on variations on the MNIST dataset [26], and apply it to bidirectional multi-modal image translation in more complex datasets [3, 38], achieving better results than state-of-the-art methods [23, 50] due to the finer control and generality granted by our disentangled representation. Additionally, we outperform [50] in cross-domain retrieval on realistic datasets [23, 45]. |
| Researcher Affiliation | Academia | Abel Gonzalez-Garcia Computer Vision Center agonzalez@cvc.uab.es Joost van de Weijer Computer Vision Center Universitat Autònoma de Barcelona Yoshua Bengio MILA Université de Montréal |
| Pseudocode | No | The paper describes the model architecture and training process in text and diagrams (Figure 2) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and models are publicly available at https://github.com/agonzgarc/cross-domain-disen. |
| Open Datasets | Yes | We demonstrate the disentanglement properties of our method on variations on the MNIST dataset [26], and apply it to bidirectional multi-modal image translation in more complex datasets [3, 38]... cross-domain retrieval on realistic datasets [23, 45]. |
| Dataset Splits | No | We use the standard splits for train (50K images) and test (10K images). (MNIST) We set 5 random cars for test and train with the remaining 796 images. (3D car models) We arrange X and Y as before, and train the model with 5,372 images from 1,343 chairs, leaving 50 chairs for test. (3D chair models) The paper explicitly states training and testing splits but does not mention a distinct validation set split. |
| Hardware Specification | No | The paper describes the experimental evaluation but does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the implementation, such as Python, PyTorch, or TensorFlow versions. |
| Experiment Setup | No | We detail the architectures and hyperparameters used for all experiments in the supplementary material. Concretely, we use an 8-dimensional noise vector z sampled from N(0, I). We have found out that adding small noise (N(0, 0.1)) to the output of the encoder as in [46] prevents this from happening and leads to better results. We train our model jointly in an end-to-end manner, minimizing the following total loss L =w GAN(LX GAN + LY GAN) + w Ex(LGX d GAN + L F Y d GAN) + w L1(LS + LX auto + LY auto + LX recon + LY recon). While some settings are mentioned, the paper explicitly states that detailed architectures and hyperparameters are in supplementary material, implying a comprehensive setup is not provided in the main text. |