Simplex Random Features
Authors: Isaac Reid, Krzysztof Marcin Choromanski, Valerii Likhosherstov, Adrian Weller
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive empirical studies, we show consistent gains provided by Sim RFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers (Choromanski et al., 2020). |
| Researcher Affiliation | Collaboration | 1University of Cambridge 2Google 3Columbia University 4Alan Turing Institute. Correspondence to: Isaac Reid <ir337@cam.ac.uk>, Krzysztof Choromanski <kchoro@google.com>. |
| Pseudocode | Yes | Algorithm 1 Fast matrix-vector multiplication with S |
| Open Source Code | Yes | Code is available at https://github.com/isaac-reid/simplex-random-features. |
| Open Datasets | Yes | We use 8 different datasets retrieved from the UCI Machine Learning Repository (Dua & Graff, 2017a) |
| Dataset Splits | Yes | The σ > 0 hyperparameter is tuned for good PRF performance on a validation dataset |
| Hardware Specification | Yes | trained for 300 epochs on the TPU architecture. |
| Software Dependencies | No | The paper mentions software components like "adam optimiser", "Python", and "PyTorch" in context, but it does not provide specific version numbers for these software dependencies, which are required for full reproducibility. |
| Experiment Setup | Yes | In all four experiments, we use a Vi T with 12 layers, 12 heads, mlp dim equal to 3072, a dropout rate of 0.1 and no attention dropout. We use the adam optimiser with weight decay equal to 0.1 and batch size bs = 4096, trained for 300 epochs on the TPU architecture. We apply 130 random vectors to approximate the softmax attention kernel with PRFs, testing both the ORF and Sim RF coupling mechanisms. |