Ready Policy One: World Building Through Active Learning
Authors: Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches. ... Table 1 and Fig 1 show the main results, where RP1 outperforms both the greedy baseline and the fixed variance maximizing (V+R) approach. |
| Researcher Affiliation | Collaboration | 1Department of Engineering Science, University of Oxford 2Department of Electrical Engineering and Computer Sciences, UC Berkeley 3Google Brain Robotics. |
| Pseudocode | Yes | Algorithm 1 Online Learning Mechanism; Algorithm 2 Early Stopping Mechanism; Algorithm 3 RP1: Ready Policy One |
| Open Source Code | Yes | For further details on our experiments, see the open sourced repo at https://github.com/fiorenza2/Ready Policy One. |
| Open Datasets | Yes | We test RP1 on a variety of continuous control tasks from the Open AI Gym (Brockman et al., 2016), namely: Half Cheetah, Ant, Swimmer and Hopper, which are commonly used to test MBRL algorithms. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits, such as percentages, sample counts, or explicit descriptions of how the data was partitioned for these phases. |
| Hardware Specification | No | The paper states 'Experiments were conducted using the GCP research credits program,' but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Open AI Gym (Brockman et al., 2016)' and the 'original PPO (Schulman et al., 2017) loss function' but does not specify version numbers for these or other software dependencies, and defers to an Appendix that is not provided. |
| Experiment Setup | No | The paper states 'Full implementation details can be found in Appendix ??', but the main text does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations. |