Measuring the Reliability of Reinforcement Learning Algorithms
Authors: Stephanie C.Y. Chan, Samuel Fishman, Anoop Korattikara, John Canny, Sergio Guadarrama
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results. |
| Researcher Affiliation | Collaboration | 1Google Research 2Berkeley EECS {scychan,sfishman,canny,kbanoop,sguada}@google.com |
| Pseudocode | No | The paper does not contain pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The metrics and accompanying statistical tools have been made available as an open-source library.1 ... We have released the code used in this paper as an open-source Python package to ease the adoption of these metrics and their complementary statistics. |
| Open Datasets | Yes | We applied the reliability metrics to algorithms tested on seven continuous control environments from the Open-AI Gym (Greg Brockman et al., 2016) run on the Mu Jo Co physics simulator (Todorov et al., 2012). ... We also applied the reliability metrics to the RL algorithms and training data released as part of the Dopamine package (Castro et al., 2018). |
| Dataset Splits | No | The paper mentions tuning hyperparameters using a black-box optimizer, which implies a validation process. However, it does not explicitly describe dataset splits for training, validation, and testing of the models themselves in a reproducible manner, but rather how the metrics are applied to existing training runs. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper mentions using TF-Agents library and Dopamine package, along with their own open-source Python package, but it does not specify version numbers for any of these software components. |
| Experiment Setup | Yes | We used a black-box optimizer (Golovin et al., 2017) to tune selected hyperparameters on a per-task basis, optimizing for final performance. The remaining hyperparameters were defined as stated in the corresponding original papers. See Appendix E for details of the hyperparameter search space and the final set of hyperparameters. ... Hyperparameters are shown in Table 8, duplicated for reference from https://github.com/google/dopamine/tree/master/baselines. ... Table 2: Hyperparameter search space for continuous control algorithms. Table 3: Final hyperparameters for SAC. Table 4: Final hyperparameters for TD3. Table 5: Final hyperparameters for PPO. Table 6: Final hyperparameters for DDPG. Table 7: Final hyperparameters for REINFORCE. Table 8: Hyperparameters for discrete control algorithms. |