Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning
Authors: Jannik Kossen, Neil Band, Clare Lyle, Aidan N. Gomez, Thomas Rainforth, Yarin Gal
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points. We evaluate NPTs on tabular data from the UCI Repository [26] as well as the CIFAR-10 [55] and MNIST [58] image classification datasets. We report the average rank order for NPT and various tree-based and deep learning baselines in Table 1. |
| Researcher Affiliation | Collaboration | Jannik Kossen1 Neil Band1 Clare Lyle1 Aidan N. Gomez1,3 Tom Rainforth2 Yarin Gal1 1 OATML, Department of Computer Science, University of Oxford 2 Department of Statistics, University of Oxford 3 Cohere |
| Pseudocode | No | The paper describes the architecture and process using figures and text but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release code for NPTs at github.com/OATML/Non-Parametric-Transformers. |
| Open Datasets | Yes | We evaluate NPTs on tabular data from the UCI Repository [26] as well as the CIFAR-10 [55] and MNIST [58] image classification datasets. |
| Dataset Splits | Yes | We tune the parameters of all models on validation sets and use 10-fold cross-validation whenever computationally feasible. |
| Hardware Specification | No | The paper mentions '24 GB of GPU memory' but does not specify the exact GPU models, CPU models, or any other detailed hardware specifications used for experiments. |
| Software Dependencies | No | The paper references software libraries like PyTorch [68] and scikit-learn [69] but does not provide specific version numbers for these or other software dependencies used in their experiments. |
| Experiment Setup | No | The paper states, 'We refer the reader to Appendix E for further details on the setup for datasets and baselines, and Appendix C.1 for NPT hyperparameters,' indicating that specific experimental setup details are provided in the appendix rather than the main text. |