reproducibilityindex.ai

Relative gradient optimization of the Jacobian term in unsupervised deep learning

Authors: Luigi Gresele, Giancarlo Fissore, Adrián Javaloy, Bernhard Schölkopf, Aapo Hyvarinen

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify empirically the computational speedup our method provides in section 5.
Researcher Affiliation	Academia	1Max Planck Institute for Intelligent Systems, Tübingen, Germany 2Max Planck Institute for Biological Cybernetics, Tübingen, Germany 3 Université Paris-Saclay, Inria, Inria Saclay-Île-de-France, 91120, Palaiseau, France 4 Université Paris-Saclay, CNRS, Laboratoire de recherche en informatique, 91405, Orsay, France 5 Dept of Computer Science, University of Helsinki, Finland
Pseudocode	No	The paper describes procedures and mathematical derivations in text and equations, but it does not include a clearly labeled pseudocode block or algorithm.
Open Source Code	Yes	The code used for our experiments can be found at https://github.com/fissoreg/ relative-gradient-jacobian.
Open Datasets	Yes	unconditional density estimation on four different UCI datasets [16] and a dataset of natural image patches (BSDS300) [41], as well as on MNIST [37].
Dataset Splits	Yes	We trained for 100 epochs, and picked the best performing model on the validation set.
Hardware Specification	Yes	The main comparison is run on a Tesla P100 Nvidia GPU.
Software Dependencies	No	The paper mentions using the "JAX package [10]" for automatic differentiation in a comparison experiment, but does not provide specific version numbers for JAX or other software libraries/dependencies used for their own method's implementation.
Experiment Setup	Yes	The results in Table 1 correspond to networks with 3 fully connected hidden layers with 1024 units each, using a smooth version of leaky-ReLU activation functions. We performed an initial grid search on the learning rate in the range [10^-3, 10^-5], and used an Adam optimizer [38] with β1 = 0.9, β2 = 0.999. We trained for 100 epochs, and picked the best performing model on the validation set. We did not use any batch normalization, dropout, or learning rate scheduling.