Relative gradient optimization of the Jacobian term in unsupervised deep learning

Authors: Luigi Gresele, Giancarlo Fissore, Adrián Javaloy, Bernhard Schölkopf, Aapo Hyvarinen

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify empirically the computational speedup our method provides in section 5.
Researcher Affiliation Academia 1Max Planck Institute for Intelligent Systems, Tübingen, Germany 2Max Planck Institute for Biological Cybernetics, Tübingen, Germany 3 Université Paris-Saclay, Inria, Inria Saclay-Île-de-France, 91120, Palaiseau, France 4 Université Paris-Saclay, CNRS, Laboratoire de recherche en informatique, 91405, Orsay, France 5 Dept of Computer Science, University of Helsinki, Finland
Pseudocode No The paper describes procedures and mathematical derivations in text and equations, but it does not include a clearly labeled pseudocode block or algorithm.
Open Source Code Yes The code used for our experiments can be found at https://github.com/fissoreg/ relative-gradient-jacobian.
Open Datasets Yes unconditional density estimation on four different UCI datasets [16] and a dataset of natural image patches (BSDS300) [41], as well as on MNIST [37].
Dataset Splits Yes We trained for 100 epochs, and picked the best performing model on the validation set.
Hardware Specification Yes The main comparison is run on a Tesla P100 Nvidia GPU.
Software Dependencies No The paper mentions using the "JAX package [10]" for automatic differentiation in a comparison experiment, but does not provide specific version numbers for JAX or other software libraries/dependencies used for their own method's implementation.
Experiment Setup Yes The results in Table 1 correspond to networks with 3 fully connected hidden layers with 1024 units each, using a smooth version of leaky-ReLU activation functions. We performed an initial grid search on the learning rate in the range [10^-3, 10^-5], and used an Adam optimizer [38] with β1 = 0.9, β2 = 0.999. We trained for 100 epochs, and picked the best performing model on the validation set. We did not use any batch normalization, dropout, or learning rate scheduling.