reproducibilityindex.ai

Acting in Delayed Environments with Non-Stationary Markov Policies

Authors: Esther Derman, Gal Dalal, Shie Mannor

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on tabular, physical, and Atari domains reveal that it converges quickly to high performance even for substantial delays, while standard approaches that either ignore the delay or rely on state-augmentation struggle or fail due to divergence. The code is available at https://github.com/galdl/rl_delay_basic.git.
Researcher Affiliation	Collaboration	Esther Derman Technion Nvidia Research Shie Mannor Nvidia Research & Technion
Pseudocode	Yes	We refer to it as m A-PI and provide its pseudo-code in Appx. B.2.
Open Source Code	Yes	The code is available at https://github.com/galdl/rl_delay_basic.git.
Open Datasets	Yes	Tabular Maze Domain. We begin with testing Delayed-Q on a Maze domain (Brockman et al., 2016)[tinyurl.com/y34tmfm9].
Dataset Splits	No	The paper describes training agents through interaction with environments (Maze, Cartpole, Acrobot, Atari) and evaluates their performance, but it does not specify traditional fixed dataset splits (e.g., percentages or counts) for training, validation, or testing from pre-existing datasets.
Hardware Specification	No	The paper mentions running experiments but does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for computations.
Software Dependencies	No	The paper mentions extending the 'DDQN algorithm' and using environments like 'Atari Learning Environment', but it does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the implementation.
Experiment Setup	Yes	We test all domains on delays m P t0, 5, 15, 25u with 5 seeds per each run.