Chefs' Random Tables: Non-Trigonometric Random Features
Authors: Valerii Likhosherstov, Krzysztof M Choromanski, Kumar Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test CRTs on many tasks ranging from non-parametric classification to training Transformers for text, speech and image data, obtaining new state-of-the-art results for low-rank text Transformers, while providing linear space and time complexity of the attention. We present an extensive empirical evaluation of CRTs. Additional details and results for each experiment can be found in the Appendix 9.10. |
| Researcher Affiliation | Collaboration | Valerii Likhosherstov* University of Cambridge vl304@cam.ac.uk Krzysztof Choromanski* Google Research & Columbia University kchoro@google.com Avinava Dubey* Google Research Frederick Liu* Google Research Tamas Sarlos Google Research Adrian Weller University of Cambridge & The Alan Turing Institute |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We include the part of the code that is not confidential, the core CRT variant: FAVOR++ mechanism. |
| Open Datasets | Yes | We evaluate on classification benchmarks from UCI Repository [24] (Table 1)... General Language Understanding Evaluation (GLUE) benchmark [57]... Libri Speech ASR corpus ([42])... Image Net ([18]). |
| Dataset Splits | Yes | hyperparameter tuned on the validation set. For GLUE tasks, we use standard splits: training and development splits from the BERT repository. |
| Hardware Specification | Yes | For GLUE training, we used 8x A100 GPUs. |
| Software Dependencies | No | All code is written in JAX/NumPy [6, 28]. |
| Experiment Setup | Yes | Batch size 128 (on 8 A100 GPUs that is 16 examples per GPU). Learning rate 2e-4. We train for 10 epochs. We use Adam optimizer with β1 = 0.9, β2 = 0.999, ϵ = 1e-6. |