Bayesian Context Aggregation for Neural Processes

Authors: Michael Volpp, Fabian Flürenbrock, Lukas Grossberger, Christian Daniel, Gerhard Neumann

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments to compare the performances of BA and of MA in NP-based models.
Researcher Affiliation Collaboration 1Bosch Center for Artificial Intelligence, Renningen, Germany 2Karlsruhe Institute of Technology, Karlsruhe, Germany 3University of Tübingen, Tübingen, Germany
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It provides architectural diagrams but not code-like descriptions of procedures.
Open Source Code Yes We publish source code to reproduce the experimental results online.4 https://github.com/boschresearch/bayesian-context-aggregation
Open Datasets Yes We use the MNIST database of 28 28 images of handwritten digits (Le Cun and Cortes, 2010)
Dataset Splits No The paper does not provide specific training/validation/test dataset splits in terms of percentages or absolute counts for the generated datasets. It describes how context and target sets are sampled dynamically for each task during training, but not a fixed reproducible split of a master dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions 'Adam optimizer' and 'Optuna framework' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes we consistently optimize the encoder and decoder network architectures, the latent-space dimensionality dz, as well as the learning rate of the Adam optimizer (Kingma and Ba, 2015), independently for all model architectures and for all experiments using the Optuna (Akiba et al., 2019) framework, cf. App. 7.5.3. If not stated differently, we report performance in terms of the mean posterior predictive log-likelihood over 256 test tasks with 256 data points each, conditioned on context sets containing N {0, 1, . . . , Nmax} data points (cf. App. 7.5.4). For sampling-based methods (VI, MC, ANP), we report the joint log-likelihood over the test sets using a Monte-Carlo approximation with 25 latent samples, cf. App. 7.5.4.