Time-Efficient Reinforcement Learning with Stochastic Stateful Policies
Authors: Firas Al-Hafez, Guoping Zhao, Jan Peters, Davide Tateo
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on complex continuous control tasks, e.g. humanoid locomotion, and demonstrate that our gradient estimator scales effectively with task complexity while offering a faster and simpler alternative to BPTT. Empirically, we compare the performances in common continuous control tasks within POMDPs when using RL. |
| Researcher Affiliation | Academia | Firas Al-Hafez1, Guoping Zhao2, Jan Peters1,3, Davide Tateo1 1 Intelligent Autonomous Systems, 2 Locomotion Laboratory 3 German Research Center for AI (DFKI), Centre for Cognitive Science, Hessian.AI TU Darmstadt, Germany {name.surname}@tu-darmstadt.de |
| Pseudocode | Yes | C ALGORITHM PSEUDOCODE |
| Open Source Code | Yes | 1The code is available at: https://github.com/robfiras/s2pg |
| Open Datasets | Yes | The first set of tasks includes the typical Mu Jo Co Gym locomotion tasks to test our approach in the RL setting. As done in Ni et al. (2022), we create partial observability by hiding information, specifically the velocity, from the state space. |
| Dataset Splits | No | The paper describes experimental procedures such as averaging results over 10 seeds and running experiments for a maximum number of steps, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts. |
| Hardware Specification | No | The paper states that 'Calculations for this research were conducted on high-performance computers Lichtenberg and CLAIX at the NHR Centers NHR4CES at TU Darmstadt and RWTH Aachen (project numbers p0020307 & p0021606),' but it does not specify concrete hardware details such as exact GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper states that 'For a fair comparison, all methods, except SLAC, are implemented in the same framework, Mushroom RL D Eramo et al. (2021),' but it does not specify version numbers for Mushroom RL or any other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | No | The paper mentions aspects of the experimental setup such as using Gaussian policies, setting the initial internal state to 0, and allocating 4 cores for computation. It refers to network architectures in Appendix D but does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, optimizer settings) for the training process within the main text or its appendices. |