reproducibilityindex.ai

Randomized Prior Functions for Deep Reinforcement Learning

Authors: Ian Osband, John Aslanides, Albin Cassirer

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We support our claims by a series of simple lemmas for simple environments, together with experimental evidence in more complex settings.
Researcher Affiliation	Industry	Deep Mind iosband@google.com John Aslanides Deep Mind jaslanides@google.com Albin Cassirer Deep Mind cassirer@google.com
Pseudocode	Yes	Algorithm 1 Randomized prior functions for ensemble posterior.
Open Source Code	No	We present an accompanying visualization at http://bit.ly/rpf_nips.
Open Datasets	Yes	We use the Deep Mind control suite [66] with reward +1 only when cos( )>0.95, \|x\|<0.1, \| \|<1, and \| x\|<1. Each episode lasts 1,000 time steps, simulating 10 seconds of interaction.
Dataset Splits	No	Figure 3 presents the average time to learn for N = 5, .., 60 up to 500K episodes over 5 seeds and ensemble K = 20.
Hardware Specification	No	No specific hardware details are provided in the paper.
Software Dependencies	No	optimize Ò \| = k L(f + pk; Dk) via ADAM [28].
Experiment Setup	Yes	We train an ensemble of K networks {Qk}K k=1 in parallel, each on a perturbed version of the observed data Ht and each with a distinct random, but ﬁxed, prior function pk.