Hybrid Random Features
Authors: Krzysztof Marcin Choromanski, Han Lin, Haoxian Chen, Arijit Sehanobish, Yuanzhe Ma, Deepali Jain, Jake Varley, Andy Zeng, Michael S Ryoo, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct exhaustive empirical evaluation of HRF ranging from pointwise kernel estimation experiments, through tests on data admitting clustering structure to benchmarking implicit-attention Transformers (also for downstream Robotics applications), demonstrating its quality in a wide spectrum of machine learning problems. |
| Researcher Affiliation | Collaboration | 1Google Brain Robotics, 2Columbia University, 3University of Cambridge, 4The Alan Turing Institute |
| Pseudocode | No | The paper describes its algorithm in text and mathematical formulas in 'THE ALGORITHM' section but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The part of the code that we could make publicly available can be found in the following github: https://github. com/HL-hanlin/HRF_ICLR2022. |
| Open Datasets | Yes | We computed empirical MSEs by averaging over 100 randomly sampled pairs of vectors from two UCI datasets: wine and Boston. ... The results on the Penn Tree Bank Marcus et al. (1993) are presented in Fig. 3. In the Appendix (Sec. I) we present additional results for the Wiki Text2 dataset. ... We also tested HRFs on speech models with Libri Speech ASR corpus (Panayotov et al., 2015). |
| Dataset Splits | No | The paper refers to 'additional validation results' and using standard datasets, but does not provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | No | The paper mentions using 'AWS compute resources' in the Acknowledgements, but it does not specify any particular GPU models, CPU models, or detailed cloud instance configurations used for running the experiments. |
| Software Dependencies | No | The paper mentions using models like LSTM and Conformer-Transducer and provides a GitHub link which likely implies software dependencies, but it does not explicitly list specific software components with their version numbers (e.g., 'Python 3.8', 'PyTorch 1.9'). |
| Experiment Setup | Yes | For the Language Modeling tasks, we trained a 2-layer LSTM with hidden size h = 200... We trained a 2-layer LSTM model with hidden and output sizes of 200, and used the output as input embedding for sampled softmax... We trained our model for 80 epochs, with batch size equal to 20, dropout ratio in LSTM equal to 0.5. ... The Conformer-Performer models consisted of l = 17 conformer layers. Each attention layer used H = 8 heads. The embedding dimensionality was p = 512 and since dimensions were split equally among different heads, query/key dimensionality was set up to d QK = 64. |