Measuring the Reliability of Reinforcement Learning Algorithms

Authors: Stephanie C.Y. Chan, Samuel Fishman, Anoop Korattikara, John Canny, Sergio Guadarrama

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results.
Researcher Affiliation Collaboration 1Google Research 2Berkeley EECS {scychan,sfishman,canny,kbanoop,sguada}@google.com
Pseudocode No The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The metrics and accompanying statistical tools have been made available as an open-source library.1 ... We have released the code used in this paper as an open-source Python package to ease the adoption of these metrics and their complementary statistics.
Open Datasets Yes We applied the reliability metrics to algorithms tested on seven continuous control environments from the Open-AI Gym (Greg Brockman et al., 2016) run on the Mu Jo Co physics simulator (Todorov et al., 2012). ... We also applied the reliability metrics to the RL algorithms and training data released as part of the Dopamine package (Castro et al., 2018).
Dataset Splits No The paper mentions tuning hyperparameters using a black-box optimizer, which implies a validation process. However, it does not explicitly describe dataset splits for training, validation, and testing of the models themselves in a reproducible manner, but rather how the metrics are applied to existing training runs.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper mentions using TF-Agents library and Dopamine package, along with their own open-source Python package, but it does not specify version numbers for any of these software components.
Experiment Setup Yes We used a black-box optimizer (Golovin et al., 2017) to tune selected hyperparameters on a per-task basis, optimizing for final performance. The remaining hyperparameters were defined as stated in the corresponding original papers. See Appendix E for details of the hyperparameter search space and the final set of hyperparameters. ... Hyperparameters are shown in Table 8, duplicated for reference from https://github.com/google/dopamine/tree/master/baselines. ... Table 2: Hyperparameter search space for continuous control algorithms. Table 3: Final hyperparameters for SAC. Table 4: Final hyperparameters for TD3. Table 5: Final hyperparameters for PPO. Table 6: Final hyperparameters for DDPG. Table 7: Final hyperparameters for REINFORCE. Table 8: Hyperparameters for discrete control algorithms.