reproducibilityindex.ai

Measuring the Reliability of Reinforcement Learning Algorithms

Authors: Stephanie C.Y. Chan, Samuel Fishman, Anoop Korattikara, John Canny, Sergio Guadarrama

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results.
Researcher Affiliation	Collaboration	1Google Research 2Berkeley EECS {scychan,sfishman,canny,kbanoop,sguada}@google.com
Pseudocode	No	The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The metrics and accompanying statistical tools have been made available as an open-source library.1 ... We have released the code used in this paper as an open-source Python package to ease the adoption of these metrics and their complementary statistics.
Open Datasets	Yes	We applied the reliability metrics to algorithms tested on seven continuous control environments from the Open-AI Gym (Greg Brockman et al., 2016) run on the Mu Jo Co physics simulator (Todorov et al., 2012). ... We also applied the reliability metrics to the RL algorithms and training data released as part of the Dopamine package (Castro et al., 2018).
Dataset Splits	No	The paper mentions tuning hyperparameters using a black-box optimizer, which implies a validation process. However, it does not explicitly describe dataset splits for training, validation, and testing of the models themselves in a reproducible manner, but rather how the metrics are applied to existing training runs.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies	No	The paper mentions using TF-Agents library and Dopamine package, along with their own open-source Python package, but it does not specify version numbers for any of these software components.
Experiment Setup	Yes	We used a black-box optimizer (Golovin et al., 2017) to tune selected hyperparameters on a per-task basis, optimizing for final performance. The remaining hyperparameters were defined as stated in the corresponding original papers. See Appendix E for details of the hyperparameter search space and the final set of hyperparameters. ... Hyperparameters are shown in Table 8, duplicated for reference from https://github.com/google/dopamine/tree/master/baselines. ... Table 2: Hyperparameter search space for continuous control algorithms. Table 3: Final hyperparameters for SAC. Table 4: Final hyperparameters for TD3. Table 5: Final hyperparameters for PPO. Table 6: Final hyperparameters for DDPG. Table 7: Final hyperparameters for REINFORCE. Table 8: Hyperparameters for discrete control algorithms.