Simplex Random Features

Authors: Isaac Reid, Krzysztof Marcin Choromanski, Valerii Likhosherstov, Adrian Weller

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive empirical studies, we show consistent gains provided by Sim RFs in settings including pointwise kernel estimation, nonparametric classification and scalable Transformers (Choromanski et al., 2020).
Researcher Affiliation Collaboration 1University of Cambridge 2Google 3Columbia University 4Alan Turing Institute. Correspondence to: Isaac Reid <ir337@cam.ac.uk>, Krzysztof Choromanski <kchoro@google.com>.
Pseudocode Yes Algorithm 1 Fast matrix-vector multiplication with S
Open Source Code Yes Code is available at https://github.com/isaac-reid/simplex-random-features.
Open Datasets Yes We use 8 different datasets retrieved from the UCI Machine Learning Repository (Dua & Graff, 2017a)
Dataset Splits Yes The σ > 0 hyperparameter is tuned for good PRF performance on a validation dataset
Hardware Specification Yes trained for 300 epochs on the TPU architecture.
Software Dependencies No The paper mentions software components like "adam optimiser", "Python", and "PyTorch" in context, but it does not provide specific version numbers for these software dependencies, which are required for full reproducibility.
Experiment Setup Yes In all four experiments, we use a Vi T with 12 layers, 12 heads, mlp dim equal to 3072, a dropout rate of 0.1 and no attention dropout. We use the adam optimiser with weight decay equal to 0.1 and batch size bs = 4096, trained for 300 epochs on the TPU architecture. We apply 130 random vectors to approximate the softmax attention kernel with PRFs, testing both the ORF and Sim RF coupling mechanisms.