Attentive Neural Processes
Authors: Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the resulting Attentive Neural Processes (ANPs) on 1D function regression and on 2D image regression. Our results show that ANPs greatly improve upon NPs in terms of reconstruction of contexts as well as speed of training, both against iterations and wall clock time. We also demonstrate that ANPs show enhanced expressiveness relative to the NP and is able to model a wider range of functions. |
| Researcher Affiliation | Collaboration | Deep Mind1, University of Oxford2 |
| Pseudocode | Yes | Figure 8: The model architecture for NP and ANP for both 1D and 2D regression. |
| Open Source Code | Yes | Code is available at https://github.com/deepmind/neural-processes/blob/master/ attentive_neural_process.ipynb |
| Open Datasets | Yes | We train the ANP on MNIST (Le Cun et al., 1998) and 32 32 Celeb A (Liu et al., 2015) using the standard train/test split with up to 200 context/target points at training. |
| Dataset Splits | No | We train the ANP on MNIST (Le Cun et al., 1998) and 32 32 Celeb A (Liu et al., 2015) using the standard train/test split with up to 200 context/target points at training. No explicit mention of a separate validation set split or methodology was provided. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, cloud instance types) were mentioned for the experimental setup. |
| Software Dependencies | No | We use the Adam Optimiser (Kingma & Ba, 2015) with a fixed learning rate of 5e-5 and Tensorflow defaults for the other hyperparameters. No specific version numbers for TensorFlow or other libraries are given. |
| Experiment Setup | Yes | We use the same decoder architecture for all experiments, and 8 heads for multihead. See Appendix A for architectural details. We use a batch size of 16 in the fixed hyperparameter setting... We use the Adam Optimiser (Kingma & Ba, 2015) with a fixed learning rate of 5e-5 and Tensorflow defaults for the other hyperparameters. We use a learning rate of 5e-5 and 4e-5 respectively for MNIST and Celeb A using the Adam optimiser with Tensorflow defaults for the other hyperparameters. |