Gradients of Functions of Large Matrices
Authors: Nicholas Krämer, Pablo Moreno-Muñoz, Hrittik Roy, Søren Hauberg
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Next, we put this code to the test on three challenging machine-learning problems centred around functions of matrices to see how it fares against state-ofthe-art differentiable implementations of exact Gaussian processes (Section 5), differential equation solvers (Section 6), and Bayesian neural networks (Section 7). |
| Researcher Affiliation | Academia | Nicholas Kr amer, Pablo Moreno-Mu noz, Hrittik Roy, Søren Hauberg Technical University of Denmark Kongens Lyngby, Denmark {pekra, pabmo, hroy, sohau}@dtu.dk |
| Pseudocode | Yes | Algorithm E.1 (Arnoldi s forward pass; paraphrased) ... Algorithm E.2 (Arnoldi s adjoint pass; paraphrased) ... Algorithm E.3 (Forward pass) ... Algorithm E.4 (Backward pass) |
| Open Source Code | Yes | Find the code at https://github.com/pnkraemer/experiments-lanczos-adjoints and install the library with pip install matfree. |
| Open Datasets | Yes | Data For the experiments we use the Protein , KEGG (undirected , KEGG (directed) , Elevators , and Kin40k datasets (Table 7, adapted from Bartels et al. [95]). All are part of the UCI data repository, and accessible through there. |
| Dataset Splits | No | The paper mentions an "80/20 train/test split" in Table 3 caption, but does not explicitly provide information on a validation split or how it was used for hyperparameter tuning. |
| Hardware Specification | Yes | The Gaussian process and differential equation case studies run on a V100 GPU, the Bayesian neural network one on a P100 GPU. |
| Software Dependencies | No | The paper mentions software like JAX, Diffrax, GPy Torch, PyTorch, and Ke Ops, but it does not provide specific version numbers for multiple key software components or for self-contained solvers, which is required for reproducibility. |
| Experiment Setup | Yes | We calibrate a Mat ern prior with smoothness ν = 1.5, using 10 matrix-vector products per Lanczos iteration, conjugate gradients tolerance of ϵ = 1, a rank-15 pivoted Cholesky preconditioner, and 10 Rademacher samples... All parameters are initialised randomly. We use the Adam optimiser with learning rate 0.05 for 75 epochs. All experiments are repeated for three different seeds. |