reproducibilityindex.ai

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

Authors: Angelos Filos, Eszter Vértes, Zita Marinho, Gregory Farquhar, Diana Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, Andre Barreto, Simon Osindero

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical evidence in both tabular and function approximation settings from pixels that self-inconsistency is useful (i) as a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a learned model.
Researcher Affiliation	Collaboration	1Deep Mind 2University of Oxford.
Pseudocode	No	The paper contains mathematical equations and diagrams to describe the methods, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper states using existing open-source libraries (JAX, TensorFlow, Muesli, VPN, Dreamer) for implementation but does not provide a statement about releasing their own source code for the methodology described in this paper.
Open Datasets	Yes	In the deep RL experiments, we use a selection of 5 tasks from the procgen suite (Cobbe et al., 2019)... We also use a modiﬁcation of the walker walk task from the Deep Mind Control suite (Tunyasuvunakool et al., 2020)... Lastly, we use the original minatar (Young & Tian, 2019) suite for fast experimentation with value-based agents (Mnih et al., 2013).
Dataset Splits	Yes	Figure 4 (left) reports the ﬁnal performance of the agent evaluated on an additional 10M frames on the train and test levels. Values are normalised by the min and max scores for each game. Right: σ-IVE(5) computed using the model of the Muesli agent while evaluating on both training and unseen test levels, for different numbers of unique levels seen during training.
Hardware Specification	No	The paper mentions the software libraries used (JAX, TensorFlow) but does not specify any hardware details like CPU/GPU models, memory, or cloud instance types used for experiments.
Software Dependencies	No	The paper mentions key software components such as Python, JAX, TensorFlow, and Matplotlib along with their respective citations, but it does not specify the version numbers for any of these software dependencies.
Experiment Setup	Yes	We use an empty 5x5 gridworld, and collect data by rolling out a uniformly random policy, initialised at the bottom right cell. We use the Dreamer agent's default hyperparameters. The self-inconsistency-seeking variant, i.e., µ + σ-IVE(5), we used a scalar weighting factor β IVE = 0.1 to balance the mean and standard deviation across the ensemble members, tuned with grid search in {0.05, 0.1, 0.2, 1.0, 10.0}. ADAM (Kingma & Ba, 2014) optimiser with learning rate 5e-5 is used, and all losses converge after 10,000 epochs of stochastic gradient descent with batch size 128. We train Muesli for 100M environment frames and use the fraction of replay data in each batch to 0.8.