Detecting Rewards Deterioration in Episodic Reinforcement Learning
Authors: Ido Greenberg, Shie Mannor
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, on deteriorated rewards in control problems (generated using various environment modiļ¬cations), the test is demonstrated to be more powerful than standard tests often by orders of magnitude. |
| Researcher Affiliation | Collaboration | 1Department of Electric Engineering, Technion, Israel 2Nvidia Research. |
| Pseudocode | Yes | Algorithm 1: BFAR: Bootstrap for FAR control |
| Open Source Code | Yes | The code of the experiments is available on Git Hub. |
| Open Datasets | Yes | We demonstrate the new procedures in the environments of Pendulum (Open AI), Half Cheetah and Humanoid (Mu Jo Co; Todorov et al., 2012). |
| Dataset Splits | No | The paper defines a 'reference dataset' (N0 episodes) and 'test blocks' (M*N episodes) for its degradation detection experiment but does not specify a separate 'validation' split for hyperparameter tuning or model selection. |
| Hardware Specification | Yes | The tests were run on a single i9-10900X CPU core. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'Open AI s baseline of A2C algorithm' but does not specify exact version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | Table 1 summarizes the setup of the various environments. Table 1: Environments parameters (episode length (T), reference episodes (N0), test blocks (M), episodes per block (N), sequential test length ( h), lookback horizons (h1, h2), tests per episode (F = T/d)) |