Neural Diffusion Processes

Authors: Vincent Dutordoir, Alan Saul, Zoubin Ghahramani, Fergus Simpson

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that NDPs can capture functional distributions close to the true Bayesian posterior, demonstrating that they can successfully emulate the behaviour of Gaussian processes and surpass the performance of neural processes.6. Experimental Evaluation
Researcher Affiliation Collaboration Vincent Dutordoir 1 2 Alan Saul 2 Zoubin Ghahramani 1 3 Fergus Simpson 2. 1Department of Engineering, University of Cambridge, Cambridge, UK 2Secondmind, Cambridge, UK 3Google Deep Mind.
Pseudocode Yes B. Algorithms In this section we list pseudo-code for training and sampling NDPs. B.1. Training Algorithm 1 Training
Open Source Code Yes The code is available at https://github.com/vdu tor/neural-diffusion-processes.
Open Datasets Yes For the MNIST dataset, our task simplifies to predicting a single output value that corresponds to grayscale intensity. However, when tackling the CELEBA 32 32 dataset, we deal with the added complexity of predicting three output values for each pixel to represent the RGB colour channels. We evaluate NDPs and NPs on two synthetic datasets, following the experimental setup from Bruinsma et al. (2021) but extending it to multiple input dimensions D.
Dataset Splits No The paper specifies training and test data sizes ("The training data is composed of 2^14 sample paths, whereas the test dataset comprises 128 paths."), and describes context and target sets for evaluation, but does not explicitly provide details about a distinct validation dataset split with percentages or counts.
Hardware Specification Yes Experiments were conducted on a 32-core machine and utilised a single Tesla V100-PCIE-32GB GPU.
Software Dependencies No The paper mentions GPflow and Tensor Flow, but does not specify their version numbers or the versions of other key software components (e.g., Python, PyTorch).
Experiment Setup Yes All experiments share the same model architecture illustrated in Figure 2, there are however a number of model parameters that must be chosen. An L1 (i.e., Mean Absolute Error, MAE) loss function was used throughout. We use four or five bi-dimensional attention blocks, each consisting of multi-head self-attention blocks (Vaswani et al., 2017) containing a representation dimensionality of H = 64 and 8 heads. Each experiment used either 500 or 1000 diffusion steps... The Adam optimiser is used throughout. Our learning rate follows a cosine-decay function, with a 20 epochs linear learning rate warm-up to a maximum learning rate of η = 0.001 before decaying. All NDP models were trained for 250 epochs... Each epoch contained 4096 example training (y0, x0) pairs. Training data was provided in batches of 32...Table 3: Experiment configuration and training time.