Randomized Prior Functions for Deep Reinforcement Learning

Authors: Ian Osband, John Aslanides, Albin Cassirer

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our claims by a series of simple lemmas for simple environments, together with experimental evidence in more complex settings.
Researcher Affiliation Industry Deep Mind iosband@google.com John Aslanides Deep Mind jaslanides@google.com Albin Cassirer Deep Mind cassirer@google.com
Pseudocode Yes Algorithm 1 Randomized prior functions for ensemble posterior.
Open Source Code No We present an accompanying visualization at http://bit.ly/rpf_nips.
Open Datasets Yes We use the Deep Mind control suite [66] with reward +1 only when cos( )>0.95, |x|<0.1, | |<1, and | x|<1. Each episode lasts 1,000 time steps, simulating 10 seconds of interaction.
Dataset Splits No Figure 3 presents the average time to learn for N = 5, .., 60 up to 500K episodes over 5 seeds and ensemble K = 20.
Hardware Specification No No specific hardware details are provided in the paper.
Software Dependencies No optimize Ò | = k L(f + pk; Dk) via ADAM [28].
Experiment Setup Yes We train an ensemble of K networks {Qk}K k=1 in parallel, each on a perturbed version of the observed data Ht and each with a distinct random, but fixed, prior function pk.