Detecting Rewards Deterioration in Episodic Reinforcement Learning

Authors: Ido Greenberg, Shie Mannor

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, on deteriorated rewards in control problems (generated using various environment modifications), the test is demonstrated to be more powerful than standard tests often by orders of magnitude.
Researcher Affiliation Collaboration 1Department of Electric Engineering, Technion, Israel 2Nvidia Research.
Pseudocode Yes Algorithm 1: BFAR: Bootstrap for FAR control
Open Source Code Yes The code of the experiments is available on Git Hub.
Open Datasets Yes We demonstrate the new procedures in the environments of Pendulum (Open AI), Half Cheetah and Humanoid (Mu Jo Co; Todorov et al., 2012).
Dataset Splits No The paper defines a 'reference dataset' (N0 episodes) and 'test blocks' (M*N episodes) for its degradation detection experiment but does not specify a separate 'validation' split for hyperparameter tuning or model selection.
Hardware Specification Yes The tests were run on a single i9-10900X CPU core.
Software Dependencies No The paper mentions 'Py Torch' and 'Open AI s baseline of A2C algorithm' but does not specify exact version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes Table 1 summarizes the setup of the various environments. Table 1: Environments parameters (episode length (T), reference episodes (N0), test blocks (M), episodes per block (N), sequential test length ( h), lookback horizons (h1, h2), tests per episode (F = T/d))