Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Memory safe computations with XLA compiler
Authors: Artem Artemev, Yuze An, Tilman Roeder, Mark van der Wilk
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that k-nearest neighbour, sparse Gaussian process regression methods and Transformers can be run on a single device at a much larger scale, where standard implementations would have failed. Our approach leads to better use of hardware resources. |
| Researcher Affiliation | Collaboration | Artem Artemev Imperial College London Secondmind EMAIL Yuze An Imperial College London EMAIL Tilman Roeder Imperial College London EMAIL Mark van der Wilk Imperial College London EMAIL |
| Pseudocode | Yes | Algorithm 1: High-level description of the depth-first search visitor-handler that splits the data-flow graph up to the reduction dot operation. |
| Open Source Code | Yes | 1The code is available at https://github.com/awav/tensorflow. |
| Open Datasets | Yes | We use randomly generated data, common benchmarks like MNIST and Fashion-MNIST, and Glove-50, Glove-100 and Glove-200 from the ANN-benchmark toolkit Aumüller et al. (2020). We conduct experiments on a Tesla V100 GPU with 32 GB of memory, and run on two of the largest UCI datasets that are commonly considered in Gaussian process research: 3droad and houseelectric with total number of data points 434,874 and 2,049,280 respectively. |
| Dataset Splits | No | The paper does not provide explicit percentages or counts for training/validation/test dataset splits, nor does it refer to specific predefined splits with citations for all three subsets. |
| Hardware Specification | Yes | We evaluated the expression in double precision on a Tesla V100 GPU with 32 GB of memory, and applied a range of memory limits. We conduct experiments on a Tesla V100 GPU with 32 GB of memory. We run experiments on a single Nvidia V100 GPU with 32GB memory. |
| Software Dependencies | Yes | We demonstrate the utility of e XLA by scaling the GPflow (Matthews et al., 2017, 2.3.1 release version) implementation of Sparse Gaussian process regression (SGPR, Titsias, 2009), without any modifications of the code. |
| Experiment Setup | Yes | In all benchmarks, we set the tensor size threshold for e XLA to 100MB for simplicity, even though this may not be optimal for performance. We set the tensor size threshold and the tensor split size in e XLA to 1GB. The Transformer model compiled with e XLA optimisations managed to run with sequences up to 7000 with tensor limit set to 10GB and the tensor split size set to 1GB. |