reproducibilityindex.ai

Detecting Rewards Deterioration in Episodic Reinforcement Learning

Authors: Ido Greenberg, Shie Mannor

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, on deteriorated rewards in control problems (generated using various environment modiﬁcations), the test is demonstrated to be more powerful than standard tests often by orders of magnitude.
Researcher Affiliation	Collaboration	1Department of Electric Engineering, Technion, Israel 2Nvidia Research.
Pseudocode	Yes	Algorithm 1: BFAR: Bootstrap for FAR control
Open Source Code	Yes	The code of the experiments is available on Git Hub.
Open Datasets	Yes	We demonstrate the new procedures in the environments of Pendulum (Open AI), Half Cheetah and Humanoid (Mu Jo Co; Todorov et al., 2012).
Dataset Splits	No	The paper defines a 'reference dataset' (N0 episodes) and 'test blocks' (M*N episodes) for its degradation detection experiment but does not specify a separate 'validation' split for hyperparameter tuning or model selection.
Hardware Specification	Yes	The tests were run on a single i9-10900X CPU core.
Software Dependencies	No	The paper mentions 'Py Torch' and 'Open AI s baseline of A2C algorithm' but does not specify exact version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	Table 1 summarizes the setup of the various environments. Table 1: Environments parameters (episode length (T), reference episodes (N0), test blocks (M), episodes per block (N), sequential test length ( h), lookback horizons (h1, h2), tests per episode (F = T/d))