reproducibilityindex.ai

Time-Efficient Reinforcement Learning with Stochastic Stateful Policies

Authors: Firas Al-Hafez, Guoping Zhao, Jan Peters, Davide Tateo

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on complex continuous control tasks, e.g. humanoid locomotion, and demonstrate that our gradient estimator scales effectively with task complexity while offering a faster and simpler alternative to BPTT. Empirically, we compare the performances in common continuous control tasks within POMDPs when using RL.
Researcher Affiliation	Academia	Firas Al-Hafez1, Guoping Zhao2, Jan Peters1,3, Davide Tateo1 1 Intelligent Autonomous Systems, 2 Locomotion Laboratory 3 German Research Center for AI (DFKI), Centre for Cognitive Science, Hessian.AI TU Darmstadt, Germany {name.surname}@tu-darmstadt.de
Pseudocode	Yes	C ALGORITHM PSEUDOCODE
Open Source Code	Yes	1The code is available at: https://github.com/robfiras/s2pg
Open Datasets	Yes	The first set of tasks includes the typical Mu Jo Co Gym locomotion tasks to test our approach in the RL setting. As done in Ni et al. (2022), we create partial observability by hiding information, specifically the velocity, from the state space.
Dataset Splits	No	The paper describes experimental procedures such as averaging results over 10 seeds and running experiments for a maximum number of steps, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts.
Hardware Specification	No	The paper states that 'Calculations for this research were conducted on high-performance computers Lichtenberg and CLAIX at the NHR Centers NHR4CES at TU Darmstadt and RWTH Aachen (project numbers p0020307 & p0021606),' but it does not specify concrete hardware details such as exact GPU/CPU models, processor types, or memory amounts.
Software Dependencies	No	The paper states that 'For a fair comparison, all methods, except SLAC, are implemented in the same framework, Mushroom RL D Eramo et al. (2021),' but it does not specify version numbers for Mushroom RL or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	No	The paper mentions aspects of the experimental setup such as using Gaussian policies, setting the initial internal state to 0, and allocating 4 cores for computation. It refers to network architectures in Appendix D but does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, optimizer settings) for the training process within the main text or its appendices.