Randomized Prior Functions for Deep Reinforcement Learning
Authors: Ian Osband, John Aslanides, Albin Cassirer
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We support our claims by a series of simple lemmas for simple environments, together with experimental evidence in more complex settings. |
| Researcher Affiliation | Industry | Deep Mind iosband@google.com John Aslanides Deep Mind jaslanides@google.com Albin Cassirer Deep Mind cassirer@google.com |
| Pseudocode | Yes | Algorithm 1 Randomized prior functions for ensemble posterior. |
| Open Source Code | No | We present an accompanying visualization at http://bit.ly/rpf_nips. |
| Open Datasets | Yes | We use the Deep Mind control suite [66] with reward +1 only when cos( )>0.95, |x|<0.1, | |<1, and | x|<1. Each episode lasts 1,000 time steps, simulating 10 seconds of interaction. |
| Dataset Splits | No | Figure 3 presents the average time to learn for N = 5, .., 60 up to 500K episodes over 5 seeds and ensemble K = 20. |
| Hardware Specification | No | No specific hardware details are provided in the paper. |
| Software Dependencies | No | optimize Ò | = k L(f + pk; Dk) via ADAM [28]. |
| Experiment Setup | Yes | We train an ensemble of K networks {Qk}K k=1 in parallel, each on a perturbed version of the observed data Ht and each with a distinct random, but fixed, prior function pk. |