Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling

Authors: Tung Nguyen, Aditya Grover

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that TNPs achieve state-of-the-art performance on various benchmark problems, outperforming all previous NP variants on meta regression, image completion, contextual multi-armed bandits, and Bayesian optimization.
Researcher Affiliation Academia Tung Nguyen 1 Aditya Grover 1 1Department of Computer Science, UCLA. Correspondence to: Tung Nguyen <tungnd@cs.ucla.edu>.
Pseudocode No The paper describes procedures but does not include a figure, block, or section labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes We have open-sourced the codebase for reproducing our experiments.1 The implementation of the baselines is borrowed from the official implementation of BNPs.2 1https://github.com/tung-nd/TNP-pytorch
Open Datasets Yes We use two datasets for this experiment: EMNIST (Cohen et al., 2017) and Celeb A (Liu et al., 2018)... and We compare TNPs with the baselines on the wheel bandit problem introduced in Riquelme et al. (2018) and we use various benchmark functions in the optimization literature (Kim & Choi, 2017; Kim, 2020).
Dataset Splits No No explicit mention of separate validation splits. The paper describes dynamic splitting into context and target points: 'For each fi, we choose N random locations to evaluate, and sample an index m that splits the sequence to context and target points. For all methods, ℓ U[0.6, 1.0), σf U[0.1, 1.0), B = 16, N U[6, 50), m U[3, 47).'
Hardware Specification Yes We measure run time on the 1-D regression task on an RTX2080Ti, with 1000 batches of size 16.
Software Dependencies No The paper mentions open-sourced codebase but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In this section we present the hyperparameters that we used to train TNPs in the experiments. ... Model dimension: 64, Number of embeddings layers: 4, Feed forward dimension: 128, Number of attention heads: 4, Number of transformer layers: 6, Dropout: 0.0, Number of training steps: 100000, Learning rate: 5e 4 with Cosine annealing scheduler