Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Detecting Rewards Deterioration in Episodic Reinforcement Learning

Authors: Ido Greenberg, Shie Mannor

ICML 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, on deteriorated rewards in control problems (generated using various environment modifications), the test is demonstrated to be more powerful than standard tests often by orders of magnitude.
Researcher Affiliation Collaboration 1Department of Electric Engineering, Technion, Israel 2Nvidia Research.
Pseudocode Yes Algorithm 1: BFAR: Bootstrap for FAR control
Open Source Code Yes The code of the experiments is available on Git Hub.
Open Datasets Yes We demonstrate the new procedures in the environments of Pendulum (Open AI), Half Cheetah and Humanoid (Mu Jo Co; Todorov et al., 2012).
Dataset Splits No The paper defines a 'reference dataset' (N0 episodes) and 'test blocks' (M*N episodes) for its degradation detection experiment but does not specify a separate 'validation' split for hyperparameter tuning or model selection.
Hardware Specification Yes The tests were run on a single i9-10900X CPU core.
Software Dependencies No The paper mentions 'Py Torch' and 'Open AI s baseline of A2C algorithm' but does not specify exact version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes Table 1 summarizes the setup of the various environments. Table 1: Environments parameters (episode length (T), reference episodes (N0), test blocks (M), episodes per block (N), sequential test length ( h), lookback horizons (h1, h2), tests per episode (F = T/d))