Approximately Equivariant Neural Processes
Authors: Matthew Ashman, Cristiana Diaconu, Adrian Weller, Wessel Bruinsma, Richard Turner
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach on a number of synthetic and real-world regression experiments, showing that approximately equivariant NP models can outperform both their non-equivariant and strictly equivariant counterparts. |
| Researcher Affiliation | Collaboration | Matthew Ashman University of Cambridge mca39@cam.ac.uk Cristiana Diaconu University of Cambridge cdd43@cam.ac.uk Adrian Weller University of Cambridge The Alan Turing Institute aw665@cam.ac.uk Wessel Bruinsma Microsoft Research AI for Science wessel.p.bruinsma@gmail.com Richard E. Turner University of Cambridge Microsoft Research AI for Science The Alan Turing Institute ret23@cam.ac.uk |
| Pseudocode | Yes | Algorithm 1: Forward pass through the Conv CNP (T) for off-the-grid data. |
| Open Source Code | Yes | An implementation of our models can be found at cambridge-mlg/aenp. |
| Open Datasets | Yes | We consider a synthetic 1-D regression task with datasets drawn from a Gaussian process (GP) with the Gibbs kernel [Gibbs, 1998]. derived from ERA5 [Copernicus Climate Change Service, 2020], consisting of surface air temperatures for the years 2018 and 2019. |
| Dataset Splits | Yes | For each task, we sample the number of context points Nc U{1, 64} and set the number of target points to Nt = 128. The context range [xc,min, xc,max] (from which the context points are uniformly sampled) is an interval of length 4, with its centre randomly sampled according to U[ 7,7] for the ID task, and according to U[13,27] for the OOD task. The target range is [xt,min, xt,max] = [xc,min 1, xc,max + 1]. This is also applicable during testing, with the test dataset consisting of 80,000 datasets. |
| Hardware Specification | Yes | We train and evaluate all models on a single 11 GB NVIDIA Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Adam W [Loshchilov and Hutter, 2017]' as the optimizer but does not specify versions for core software libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | For all models, we optimise the model parameters using Adam W [Loshchilov and Hutter, 2017] with a learning rate of 5 10 4 and batch size of 16. Gradient value magnitudes are clipped at 0.5. We train for a maximum of 500 epochs, with each epoch consisting of 16,000 datasets (10,000 iterations per epoch). |