Continuous-Time Functional Diffusion Processes

Authors: Giulio Franzese, Giulio Corallo, Simone Rossi, Markus Heinonen, Maurizio Filippone, Pietro Michiardi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theory with a series of experiments to illustrate the viability of FDPs, in 6. In our experiments, the score network is a simple multilayer perceptron (MLP), with several orders of magnitude fewer parameters than any existing score-based diffusion model. To the best of our knowledge, we are the first to show that a functional-space diffusion model can generate realistic image data, beyond simple data-sets and toy models. We present quantitative results in Table 1, showing that our method FDP(MLP) achieves an impressively low FID score, given the extremely low parameter count, and the simplicity of the architecture.
Researcher Affiliation Collaboration Giulio Franzese EURECOM, France Giulio Corallo EURECOM, France Simone Rossi Stellantis, France Markus Heinonen Aalto University, Finland Maurizio Filippone EURECOM, France Pietro Michiardi EURECOM, France
Pseudocode No The paper describes mathematical formulations and numerical approximations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code available here.
Open Datasets Yes We evaluate our approach on image data, using the CELEBA 64 64 (Liu et al., 2015) dataset. We evaluate our approach on a simple data-set, using MNIST 32 32 (Le Cun et al., 2010). To demonstrate the versatility of our framework, we conducted preliminary experiments on an audio dataset, specifically the Spoken Digit Dataset.
Dataset Splits No The paper mentions using datasets but does not explicitly provide details on training, validation, and test splits (e.g., percentages or sample counts for each split).
Hardware Specification No The paper does not explicitly specify the hardware used for running the experiments (e.g., specific GPU/CPU models, memory).
Software Dependencies No We implemented our approach in JAX (Bradbury et al., 2018), and use WANDB (Biewald, 2020) for our experimental protocol. While JAX and WANDB are mentioned, specific version numbers for these software components are not provided.
Experiment Setup Yes In the outer loop, the optimization algorithm is Ada Belief (Zhuang et al., 2020), sweeping the learning rate over 1e-4, 1e-5, 1e-6. The inner loop is implemented by using three steps of stochastic gradient descent (SGD). For optimization during our training, we utilized the Adam WLoshchilov & Hutter (2017) algorithm with a weight decay of 0.03. We employed a cosine warm-up schedule for the learning rate, which ends at a value of 2e-4. In the first case, we consider a score network implemented as a simple MLP with 15 layers and 256 neurons in each layer. The architecture comprises 7 layers, with each layer composed of a self-attention mechanism with 8 attention heads and a feedforward layer.