reproducibilityindex.ai

Overcoming the Disentanglement vs Reconstruction Trade-off via Jacobian Supervision

Authors: José Lezama

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the applicability of this method in both unsupervised and supervised scenarios for learning disentangled representations. In a facial attribute manipulation task, we obtain high quality image generation while smoothly controlling dozens of attributes with a single model. This is an order of magnitude more disentangled factors than state-of-the-art methods, while obtaining visually similar or superior results, and avoiding adversarial training. Table 1: Quantitative comparison of the disentanglement and reconstruction performance of the unsupervised method on MNIST digits.
Researcher Affiliation	Academia	Jos e Lezama Universidad de la Rep ublica, Uruguay jlezama@fing.edu.uy
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Source code available at https://github.com/jlezama/disentangling-jacobian .
Open Datasets	Yes	For this example, we trained a 3-layer multi-layer perceptron (MLP) on MNIST digits, using only the L2 reconstruction loss. We train and evaluate our method on the standard Celeb A dataset (Liu et al., 2015), which contains 200,000 aligned faces of celebrities with 40 annotated attributes. We applied the procedure described in Section 3 for progressive unsupervised learning of disentangled representations to the Street View House Numbers (SVHN) dataset (Netzer et al., 2011).
Dataset Splits	Yes	From Celeb A, we use 162,770 images of size 256x256 for training and the rest for validation. All the result ﬁgures in this paper show images from the validation set and were obtained using the same single model.
Hardware Specification	No	The paper mentions 'Experiments were partially run on Cluster UY, National Center for Supercomputing, Uruguay.' but does not provide specific hardware details such as GPU/CPU models, memory, or exact machine configurations.
Software Dependencies	No	The paper mentions optimizers like 'Adam' and network components but does not provide specific version numbers for software libraries (e.g., TensorFlow, PyTorch) or other key dependencies.
Experiment Setup	Yes	We used Adam (Kingma & Ba, 2014) with a learning rate of 3e 4, a batch size of 128 and weight decay coefﬁcient 1e 6. For the experiments in Figure 1 we used λy = 0.25, λdiff = 0.1. We perform grid search to ﬁnd the values of the weights in (14) by training for 10 epochs and evaluating on a hold-out validation set. The values we used in the experiments in this paper are λ1 = 102, λ2 = 10 1, λ3 = 10 1, λ4 = 10 4, λ5 = 10 5. We trained all networks using Adam, with learning rate of 0.002, β1 = 0.5 and β2 = 0.999. We use a batch size of 128.