Memory safe computations with XLA compiler
Authors: Artem Artemev, Yuze An, Tilman Roeder, Mark van der Wilk
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that k-nearest neighbour, sparse Gaussian process regression methods and Transformers can be run on a single device at a much larger scale, where standard implementations would have failed. Our approach leads to better use of hardware resources. |
| Researcher Affiliation | Collaboration | Artem Artemev Imperial College London Secondmind a.artemev20@imperial.ac.uk Yuze An Imperial College London yuze.an21@imperial.ac.uk Tilman Roeder Imperial College London tilman.roeder17@imperial.ac.uk Mark van der Wilk Imperial College London m.vdwilk@imperial.ac.uk |
| Pseudocode | Yes | Algorithm 1: High-level description of the depth-first search visitor-handler that splits the data-flow graph up to the reduction dot operation. |
| Open Source Code | Yes | 1The code is available at https://github.com/awav/tensorflow. |
| Open Datasets | Yes | We use randomly generated data, common benchmarks like MNIST and Fashion-MNIST, and Glove-50, Glove-100 and Glove-200 from the ANN-benchmark toolkit Aumüller et al. (2020). We conduct experiments on a Tesla V100 GPU with 32 GB of memory, and run on two of the largest UCI datasets that are commonly considered in Gaussian process research: 3droad and houseelectric with total number of data points 434,874 and 2,049,280 respectively. |
| Dataset Splits | No | The paper does not provide explicit percentages or counts for training/validation/test dataset splits, nor does it refer to specific predefined splits with citations for all three subsets. |
| Hardware Specification | Yes | We evaluated the expression in double precision on a Tesla V100 GPU with 32 GB of memory, and applied a range of memory limits. We conduct experiments on a Tesla V100 GPU with 32 GB of memory. We run experiments on a single Nvidia V100 GPU with 32GB memory. |
| Software Dependencies | Yes | We demonstrate the utility of e XLA by scaling the GPflow (Matthews et al., 2017, 2.3.1 release version) implementation of Sparse Gaussian process regression (SGPR, Titsias, 2009), without any modifications of the code. |
| Experiment Setup | Yes | In all benchmarks, we set the tensor size threshold for e XLA to 100MB for simplicity, even though this may not be optimal for performance. We set the tensor size threshold and the tensor split size in e XLA to 1GB. The Transformer model compiled with e XLA optimisations managed to run with sequences up to 7000 with tensor limit set to 10GB and the tensor split size set to 1GB. |