reproducibilityindex.ai

Memory safe computations with XLA compiler

Authors: Artem Artemev, Yuze An, Tilman Roeder, Mark van der Wilk

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that k-nearest neighbour, sparse Gaussian process regression methods and Transformers can be run on a single device at a much larger scale, where standard implementations would have failed. Our approach leads to better use of hardware resources.
Researcher Affiliation	Collaboration	Artem Artemev Imperial College London Secondmind a.artemev20@imperial.ac.uk Yuze An Imperial College London yuze.an21@imperial.ac.uk Tilman Roeder Imperial College London tilman.roeder17@imperial.ac.uk Mark van der Wilk Imperial College London m.vdwilk@imperial.ac.uk
Pseudocode	Yes	Algorithm 1: High-level description of the depth-ﬁrst search visitor-handler that splits the data-ﬂow graph up to the reduction dot operation.
Open Source Code	Yes	1The code is available at https://github.com/awav/tensorflow.
Open Datasets	Yes	We use randomly generated data, common benchmarks like MNIST and Fashion-MNIST, and Glove-50, Glove-100 and Glove-200 from the ANN-benchmark toolkit Aumüller et al. (2020). We conduct experiments on a Tesla V100 GPU with 32 GB of memory, and run on two of the largest UCI datasets that are commonly considered in Gaussian process research: 3droad and houseelectric with total number of data points 434,874 and 2,049,280 respectively.
Dataset Splits	No	The paper does not provide explicit percentages or counts for training/validation/test dataset splits, nor does it refer to specific predefined splits with citations for all three subsets.
Hardware Specification	Yes	We evaluated the expression in double precision on a Tesla V100 GPU with 32 GB of memory, and applied a range of memory limits. We conduct experiments on a Tesla V100 GPU with 32 GB of memory. We run experiments on a single Nvidia V100 GPU with 32GB memory.
Software Dependencies	Yes	We demonstrate the utility of e XLA by scaling the GPﬂow (Matthews et al., 2017, 2.3.1 release version) implementation of Sparse Gaussian process regression (SGPR, Titsias, 2009), without any modiﬁcations of the code.
Experiment Setup	Yes	In all benchmarks, we set the tensor size threshold for e XLA to 100MB for simplicity, even though this may not be optimal for performance. We set the tensor size threshold and the tensor split size in e XLA to 1GB. The Transformer model compiled with e XLA optimisations managed to run with sequences up to 7000 with tensor limit set to 10GB and the tensor split size set to 1GB.