Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Authors: Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Thomas Rainforth, Yarin Gal

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points. We evaluate NPTs on tabular data from the UCI Repository [26] as well as the CIFAR-10 [55] and MNIST [58] image classification datasets. We report the average rank order for NPT and various tree-based and deep learning baselines in Table 1.
Researcher Affiliation Collaboration Jannik Kossen1 Neil Band1 Clare Lyle1 Aidan N. Gomez1,3 Tom Rainforth2 Yarin Gal1 1 OATML, Department of Computer Science, University of Oxford 2 Department of Statistics, University of Oxford 3 Cohere
Pseudocode No The paper describes the architecture and process using figures and text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release code for NPTs at github.com/OATML/Non-Parametric-Transformers.
Open Datasets Yes We evaluate NPTs on tabular data from the UCI Repository [26] as well as the CIFAR-10 [55] and MNIST [58] image classification datasets.
Dataset Splits Yes We tune the parameters of all models on validation sets and use 10-fold cross-validation whenever computationally feasible.
Hardware Specification No The paper mentions '24 GB of GPU memory' but does not specify the exact GPU models, CPU models, or any other detailed hardware specifications used for experiments.
Software Dependencies No The paper references software libraries like PyTorch [68] and scikit-learn [69] but does not provide specific version numbers for these or other software dependencies used in their experiments.
Experiment Setup No The paper states, 'We refer the reader to Appendix E for further details on the setup for datasets and baselines, and Appendix C.1 for NPT hyperparameters,' indicating that specific experimental setup details are provided in the appendix rather than the main text.