Hybrid Random Features

Authors: Krzysztof Marcin Choromanski, Han Lin, Haoxian Chen, Arijit Sehanobish, Yuanzhe Ma, Deepali Jain, Jake Varley, Andy Zeng, Michael S Ryoo, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct exhaustive empirical evaluation of HRF ranging from pointwise kernel estimation experiments, through tests on data admitting clustering structure to benchmarking implicit-attention Transformers (also for downstream Robotics applications), demonstrating its quality in a wide spectrum of machine learning problems.
Researcher Affiliation Collaboration 1Google Brain Robotics, 2Columbia University, 3University of Cambridge, 4The Alan Turing Institute
Pseudocode No The paper describes its algorithm in text and mathematical formulas in 'THE ALGORITHM' section but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The part of the code that we could make publicly available can be found in the following github: https://github. com/HL-hanlin/HRF_ICLR2022.
Open Datasets Yes We computed empirical MSEs by averaging over 100 randomly sampled pairs of vectors from two UCI datasets: wine and Boston. ... The results on the Penn Tree Bank Marcus et al. (1993) are presented in Fig. 3. In the Appendix (Sec. I) we present additional results for the Wiki Text2 dataset. ... We also tested HRFs on speech models with Libri Speech ASR corpus (Panayotov et al., 2015).
Dataset Splits No The paper refers to 'additional validation results' and using standard datasets, but does not provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification No The paper mentions using 'AWS compute resources' in the Acknowledgements, but it does not specify any particular GPU models, CPU models, or detailed cloud instance configurations used for running the experiments.
Software Dependencies No The paper mentions using models like LSTM and Conformer-Transducer and provides a GitHub link which likely implies software dependencies, but it does not explicitly list specific software components with their version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup Yes For the Language Modeling tasks, we trained a 2-layer LSTM with hidden size h = 200... We trained a 2-layer LSTM model with hidden and output sizes of 200, and used the output as input embedding for sampled softmax... We trained our model for 80 epochs, with batch size equal to 20, dropout ratio in LSTM equal to 0.5. ... The Conformer-Performer models consisted of l = 17 conformer layers. Each attention layer used H = 8 heads. The embedding dimensionality was p = 512 and since dimensions were split equally among different heads, query/key dimensionality was set up to d QK = 64.