Acting in Delayed Environments with Non-Stationary Markov Policies
Authors: Esther Derman, Gal Dalal, Shie Mannor
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on tabular, physical, and Atari domains reveal that it converges quickly to high performance even for substantial delays, while standard approaches that either ignore the delay or rely on state-augmentation struggle or fail due to divergence. The code is available at https://github.com/galdl/rl_delay_basic.git. |
| Researcher Affiliation | Collaboration | Esther Derman Technion Nvidia Research Shie Mannor Nvidia Research & Technion |
| Pseudocode | Yes | We refer to it as m A-PI and provide its pseudo-code in Appx. B.2. |
| Open Source Code | Yes | The code is available at https://github.com/galdl/rl_delay_basic.git. |
| Open Datasets | Yes | Tabular Maze Domain. We begin with testing Delayed-Q on a Maze domain (Brockman et al., 2016)[tinyurl.com/y34tmfm9]. |
| Dataset Splits | No | The paper describes training agents through interaction with environments (Maze, Cartpole, Acrobot, Atari) and evaluates their performance, but it does not specify traditional fixed dataset splits (e.g., percentages or counts) for training, validation, or testing from pre-existing datasets. |
| Hardware Specification | No | The paper mentions running experiments but does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for computations. |
| Software Dependencies | No | The paper mentions extending the 'DDQN algorithm' and using environments like 'Atari Learning Environment', but it does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the implementation. |
| Experiment Setup | Yes | We test all domains on delays m P t0, 5, 15, 25u with 5 seeds per each run. |