Gradients of Functions of Large Matrices

Authors: Nicholas Krämer, Pablo Moreno-Muñoz, Hrittik Roy, Søren Hauberg

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Next, we put this code to the test on three challenging machine-learning problems centred around functions of matrices to see how it fares against state-ofthe-art differentiable implementations of exact Gaussian processes (Section 5), differential equation solvers (Section 6), and Bayesian neural networks (Section 7).
Researcher Affiliation Academia Nicholas Kr amer, Pablo Moreno-Mu noz, Hrittik Roy, Søren Hauberg Technical University of Denmark Kongens Lyngby, Denmark {pekra, pabmo, hroy, sohau}@dtu.dk
Pseudocode Yes Algorithm E.1 (Arnoldi s forward pass; paraphrased) ... Algorithm E.2 (Arnoldi s adjoint pass; paraphrased) ... Algorithm E.3 (Forward pass) ... Algorithm E.4 (Backward pass)
Open Source Code Yes Find the code at https://github.com/pnkraemer/experiments-lanczos-adjoints and install the library with pip install matfree.
Open Datasets Yes Data For the experiments we use the Protein , KEGG (undirected , KEGG (directed) , Elevators , and Kin40k datasets (Table 7, adapted from Bartels et al. [95]). All are part of the UCI data repository, and accessible through there.
Dataset Splits No The paper mentions an "80/20 train/test split" in Table 3 caption, but does not explicitly provide information on a validation split or how it was used for hyperparameter tuning.
Hardware Specification Yes The Gaussian process and differential equation case studies run on a V100 GPU, the Bayesian neural network one on a P100 GPU.
Software Dependencies No The paper mentions software like JAX, Diffrax, GPy Torch, PyTorch, and Ke Ops, but it does not provide specific version numbers for multiple key software components or for self-contained solvers, which is required for reproducibility.
Experiment Setup Yes We calibrate a Mat ern prior with smoothness ν = 1.5, using 10 matrix-vector products per Lanczos iteration, conjugate gradients tolerance of ϵ = 1, a rank-15 pivoted Cholesky preconditioner, and 10 Rademacher samples... All parameters are initialised randomly. We use the Adam optimiser with learning rate 0.05 for 75 epochs. All experiments are repeated for three different seeds.