reproducibilityindex.ai

Reinforcement Learning with Random Delays

Authors: Yann Bouteiller, Simon Ramstedt, Giovanni Beltrame, Christopher Pal, Jonathan Binas

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This is shown theoretically and also demonstrated practically on a delay-augmented version of the Mu Jo Co continuous control benchmark.
Researcher Affiliation	Academia	Yann Bouteiller Polytechnique Montreal yann.bouteiller@polymtl.ca Simon Ramstedt Mila, Mc Gill University simonramstedt@gmail.com Giovanni Beltrame Polytechnique Montreal Christopher Pal Mila, Polytechnique Montreal Jonathan Binas Mila, University of Montreal
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Along with this work we release our code, including a wrapper that conveniently augments any Open AI gym environment with custom delays.
Open Datasets	Yes	In particular, this enables us to introduce random delays to the Gym Mu Jo Co continuous control suite (Brockman et al., 2016; Todorov et al.), which is otherwise turn-based.
Dataset Splits	No	The paper uses reinforcement learning environments (MuJoCo) and does not describe explicit train/validation/test dataset splits with percentages or sample counts, which are not typically applicable in this setting.
Hardware Specification	No	The paper thanks Element AI and Compute Canada for providing computational resources but does not specify any exact hardware details such as GPU models, CPU models, or memory.
Software Dependencies	No	The paper mentions using PyTorch for initialization and refers to the Adam optimizer, but it does not specify exact version numbers for any software libraries or dependencies.
Experiment Setup	Yes	Table 1: Hyperparameters lists specific values for Optimizer (Adam), Learning rate (0.0003), Discount factor (γ) (0.99), Batch size (128), Target weights update coefﬁcient (τ) (0.005), Gradient steps / environment steps (1), Reward scale (5.0), Entropy scale (1.0), Replay memory size (1000000), Number of samples before training starts (10000), and Number of critics (2).