Neural Diffusion Processes
Authors: Vincent Dutordoir, Alan Saul, Zoubin Ghahramani, Fergus Simpson
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that NDPs can capture functional distributions close to the true Bayesian posterior, demonstrating that they can successfully emulate the behaviour of Gaussian processes and surpass the performance of neural processes.6. Experimental Evaluation |
| Researcher Affiliation | Collaboration | Vincent Dutordoir 1 2 Alan Saul 2 Zoubin Ghahramani 1 3 Fergus Simpson 2. 1Department of Engineering, University of Cambridge, Cambridge, UK 2Secondmind, Cambridge, UK 3Google Deep Mind. |
| Pseudocode | Yes | B. Algorithms In this section we list pseudo-code for training and sampling NDPs. B.1. Training Algorithm 1 Training |
| Open Source Code | Yes | The code is available at https://github.com/vdu tor/neural-diffusion-processes. |
| Open Datasets | Yes | For the MNIST dataset, our task simplifies to predicting a single output value that corresponds to grayscale intensity. However, when tackling the CELEBA 32 32 dataset, we deal with the added complexity of predicting three output values for each pixel to represent the RGB colour channels. We evaluate NDPs and NPs on two synthetic datasets, following the experimental setup from Bruinsma et al. (2021) but extending it to multiple input dimensions D. |
| Dataset Splits | No | The paper specifies training and test data sizes ("The training data is composed of 2^14 sample paths, whereas the test dataset comprises 128 paths."), and describes context and target sets for evaluation, but does not explicitly provide details about a distinct validation dataset split with percentages or counts. |
| Hardware Specification | Yes | Experiments were conducted on a 32-core machine and utilised a single Tesla V100-PCIE-32GB GPU. |
| Software Dependencies | No | The paper mentions GPflow and Tensor Flow, but does not specify their version numbers or the versions of other key software components (e.g., Python, PyTorch). |
| Experiment Setup | Yes | All experiments share the same model architecture illustrated in Figure 2, there are however a number of model parameters that must be chosen. An L1 (i.e., Mean Absolute Error, MAE) loss function was used throughout. We use four or five bi-dimensional attention blocks, each consisting of multi-head self-attention blocks (Vaswani et al., 2017) containing a representation dimensionality of H = 64 and 8 heads. Each experiment used either 500 or 1000 diffusion steps... The Adam optimiser is used throughout. Our learning rate follows a cosine-decay function, with a 20 epochs linear learning rate warm-up to a maximum learning rate of η = 0.001 before decaying. All NDP models were trained for 250 epochs... Each epoch contained 4096 example training (y0, x0) pairs. Training data was provided in batches of 32...Table 3: Experiment configuration and training time. |