Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Relative gradient optimization of the Jacobian term in unsupervised deep learning
Authors: Luigi Gresele, Giancarlo Fissore, Adrián Javaloy, Bernhard Schölkopf, Aapo Hyvarinen
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify empirically the computational speedup our method provides in section 5. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Intelligent Systems, Tübingen, Germany 2Max Planck Institute for Biological Cybernetics, Tübingen, Germany 3 Université Paris-Saclay, Inria, Inria Saclay-Île-de-France, 91120, Palaiseau, France 4 Université Paris-Saclay, CNRS, Laboratoire de recherche en informatique, 91405, Orsay, France 5 Dept of Computer Science, University of Helsinki, Finland |
| Pseudocode | No | The paper describes procedures and mathematical derivations in text and equations, but it does not include a clearly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | The code used for our experiments can be found at https://github.com/fissoreg/ relative-gradient-jacobian. |
| Open Datasets | Yes | unconditional density estimation on four different UCI datasets [16] and a dataset of natural image patches (BSDS300) [41], as well as on MNIST [37]. |
| Dataset Splits | Yes | We trained for 100 epochs, and picked the best performing model on the validation set. |
| Hardware Specification | Yes | The main comparison is run on a Tesla P100 Nvidia GPU. |
| Software Dependencies | No | The paper mentions using the "JAX package [10]" for automatic differentiation in a comparison experiment, but does not provide specific version numbers for JAX or other software libraries/dependencies used for their own method's implementation. |
| Experiment Setup | Yes | The results in Table 1 correspond to networks with 3 fully connected hidden layers with 1024 units each, using a smooth version of leaky-ReLU activation functions. We performed an initial grid search on the learning rate in the range [10^-3, 10^-5], and used an Adam optimizer [38] with β1 = 0.9, β2 = 0.999. We trained for 100 epochs, and picked the best performing model on the validation set. We did not use any batch normalization, dropout, or learning rate scheduling. |