Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

Authors: Felix Chern, Blake Hechtman, Andy Davis, Ruiqi Guo, David Majnemer, Sanjiv Kumar

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments verifying our TPU implementation of the algorithm accurately aligned with the performance model and achieves state-of-the-art speed-recall trade-offs on standard nearest neighbor search benchmarks.
Researcher Affiliation Industry Felix Chern Google Research EMAIL Blake Hechtman Google Core ML EMAIL Andy Davis Google Core ML EMAIL Ruiqi Guo Google Research EMAIL David Majnemer Google Core ML EMAIL Sanjiv Kumar Google Research EMAIL
Pseudocode Yes Algorithm 1: Partial Reduce for MIPS
Open Source Code Yes Our work is available in the open-source package of Jax and Tensorflow on TPU.
Open Datasets Yes We select the Glove4 (Pennington et al., 2014) and Sift5 (Jegou et al., 2010) datasets from the ANN benchmarks. Their corresponding distances are the cosine distance and the Euclidean distance. See the code snippets in Appendix A.1 and A.2. (Footnote 4: Released in Apache license 2.0. Footnote 5: Released in CC0 public domain.)
Dataset Splits No Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]
Hardware Specification Yes In particular, we are interested in the novel platforms that deliver high FLOP/s for distance computation, namely Google TPU V3, V4, Nvidia GPU V100, and A100 in our analysis and evaluation. Table 1: Hardware specifications for the generalized roofline model
Software Dependencies No The paper mentions using Jax and TensorFlow, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup No Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A]