Overcoming the Disentanglement vs Reconstruction Trade-off via Jacobian Supervision

Authors: José Lezama

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the applicability of this method in both unsupervised and supervised scenarios for learning disentangled representations. In a facial attribute manipulation task, we obtain high quality image generation while smoothly controlling dozens of attributes with a single model. This is an order of magnitude more disentangled factors than state-of-the-art methods, while obtaining visually similar or superior results, and avoiding adversarial training. Table 1: Quantitative comparison of the disentanglement and reconstruction performance of the unsupervised method on MNIST digits.
Researcher Affiliation Academia Jos e Lezama Universidad de la Rep ublica, Uruguay jlezama@fing.edu.uy
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Source code available at https://github.com/jlezama/disentangling-jacobian .
Open Datasets Yes For this example, we trained a 3-layer multi-layer perceptron (MLP) on MNIST digits, using only the L2 reconstruction loss. We train and evaluate our method on the standard Celeb A dataset (Liu et al., 2015), which contains 200,000 aligned faces of celebrities with 40 annotated attributes. We applied the procedure described in Section 3 for progressive unsupervised learning of disentangled representations to the Street View House Numbers (SVHN) dataset (Netzer et al., 2011).
Dataset Splits Yes From Celeb A, we use 162,770 images of size 256x256 for training and the rest for validation. All the result figures in this paper show images from the validation set and were obtained using the same single model.
Hardware Specification No The paper mentions 'Experiments were partially run on Cluster UY, National Center for Supercomputing, Uruguay.' but does not provide specific hardware details such as GPU/CPU models, memory, or exact machine configurations.
Software Dependencies No The paper mentions optimizers like 'Adam' and network components but does not provide specific version numbers for software libraries (e.g., TensorFlow, PyTorch) or other key dependencies.
Experiment Setup Yes We used Adam (Kingma & Ba, 2014) with a learning rate of 3e 4, a batch size of 128 and weight decay coefficient 1e 6. For the experiments in Figure 1 we used λy = 0.25, λdiff = 0.1. We perform grid search to find the values of the weights in (14) by training for 10 epochs and evaluating on a hold-out validation set. The values we used in the experiments in this paper are λ1 = 102, λ2 = 10 1, λ3 = 10 1, λ4 = 10 4, λ5 = 10 5. We trained all networks using Adam, with learning rate of 0.002, β1 = 0.5 and β2 = 0.999. We use a batch size of 128.