Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments
Authors: Desik Rengarajan, Sapana Chaudhary, Jaewon Kim, Dileep Kalathil, Srinivas Shakkottai
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot. 4 Experimental Evaluation |
| Researcher Affiliation | Academia | Desik Rengarajan Sapana Chaudhary Jaewon Kim Dileep Kalathil Srinivas Shakkottai Department of Electrical and Computer Engineering, Texas A&M University {desik,sapanac,jwkim8804,dileep.kalathil,sshakkot}@tamu.edu |
| Pseudocode | Yes | Algorithm 1 Enhanced Meta-RL using Demonstrations (EMRLD) |
| Open Source Code | Yes | We provide videos of the robot experiments and code at https://github.com/Desik Rengarajan/EMRLD. |
| Open Datasets | Yes | We show on standard Mu Jo Co and two-wheeled robot environments that our algorithms work exceptionally well, even when only provided with just one trajectory of sub-optimal demonstration data per task. We train over a small number of tasks that differ in their reward functions. Point2D Navigation is a 2 dimensional goal-reaching environment. Two Wheeled Locomotion environment is a goal-reaching with sparse rewards... Half Cheetah Forward-Backward consists of two tasks... |
| Dataset Splits | Yes | Each task i p(T ) is also associated with a data set Di, which is typically divided into training data Dtr i used for task specific adaptation and validation data Dval i used for meta-parameter update. |
| Hardware Specification | No | The paper does not specify the exact hardware components (e.g., GPU models, CPU types, or memory specifications) used to run the simulations or train the models. It only mentions the Turtle Bot robot as the subject of real-world experiments. |
| Software Dependencies | No | The paper mentions that 'The implemention of our algorithms and baselines is based on a publicly available meta-learning code base [3]', referring to 'learn2learn' and also cites PyTorch. However, it does not provide specific version numbers for these software components or other dependencies required for replication. |
| Experiment Setup | No | The paper mentions the use of hyperparameters `wrl` and `wbc` and learning rates `alpha` and `beta` in Algorithm 1, and describes how demonstration data was generated. However, it does not explicitly state the specific numerical values for these hyperparameters, training configurations, or other system-level settings in the main text of the paper. It states 'Further details on state-space and dynamics are provided in the Appendix' for some parts, implying not all setup details are in the main body. |