reproducibilityindex.ai

Chefs' Random Tables: Non-Trigonometric Random Features

Authors: Valerii Likhosherstov, Krzysztof M Choromanski, Kumar Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test CRTs on many tasks ranging from non-parametric classification to training Transformers for text, speech and image data, obtaining new state-of-the-art results for low-rank text Transformers, while providing linear space and time complexity of the attention. We present an extensive empirical evaluation of CRTs. Additional details and results for each experiment can be found in the Appendix 9.10.
Researcher Affiliation	Collaboration	Valerii Likhosherstov* University of Cambridge vl304@cam.ac.uk Krzysztof Choromanski* Google Research & Columbia University kchoro@google.com Avinava Dubey* Google Research Frederick Liu* Google Research Tamas Sarlos Google Research Adrian Weller University of Cambridge & The Alan Turing Institute
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We include the part of the code that is not confidential, the core CRT variant: FAVOR++ mechanism.
Open Datasets	Yes	We evaluate on classification benchmarks from UCI Repository [24] (Table 1)... General Language Understanding Evaluation (GLUE) benchmark [57]... Libri Speech ASR corpus ([42])... Image Net ([18]).
Dataset Splits	Yes	hyperparameter tuned on the validation set. For GLUE tasks, we use standard splits: training and development splits from the BERT repository.
Hardware Specification	Yes	For GLUE training, we used 8x A100 GPUs.
Software Dependencies	No	All code is written in JAX/NumPy [6, 28].
Experiment Setup	Yes	Batch size 128 (on 8 A100 GPUs that is 16 examples per GPU). Learning rate 2e-4. We train for 10 epochs. We use Adam optimizer with β1 = 0.9, β2 = 0.999, ϵ = 1e-6.