Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Simplex Random Features
Authors: Isaac Reid, Krzysztof Marcin Choromanski, Valerii Likhosherstov, Adrian Weller
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive empirical studies, we show consistent gains provided by Sim RFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers (Choromanski et al., 2020). |
| Researcher Affiliation | Collaboration | 1University of Cambridge 2Google 3Columbia University 4Alan Turing Institute. Correspondence to: Isaac Reid <EMAIL>, Krzysztof Choromanski <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Fast matrix-vector multiplication with S |
| Open Source Code | Yes | Code is available at https://github.com/isaac-reid/simplex-random-features. |
| Open Datasets | Yes | We use 8 different datasets retrieved from the UCI Machine Learning Repository (Dua & Graff, 2017a) |
| Dataset Splits | Yes | The σ > 0 hyperparameter is tuned for good PRF performance on a validation dataset |
| Hardware Specification | Yes | trained for 300 epochs on the TPU architecture. |
| Software Dependencies | No | The paper mentions software components like "adam optimiser", "Python", and "PyTorch" in context, but it does not provide specific version numbers for these software dependencies, which are required for full reproducibility. |
| Experiment Setup | Yes | In all four experiments, we use a Vi T with 12 layers, 12 heads, mlp dim equal to 3072, a dropout rate of 0.1 and no attention dropout. We use the adam optimiser with weight decay equal to 0.1 and batch size bs = 4096, trained for 300 epochs on the TPU architecture. We apply 130 random vectors to approximate the softmax attention kernel with PRFs, testing both the ORF and Sim RF coupling mechanisms. |