Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Authors: Desik Rengarajan, Sapana Chaudhary, Jaewon Kim, Dileep Kalathil, Srinivas Shakkottai

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot. 4 Experimental Evaluation
Researcher Affiliation Academia Desik Rengarajan Sapana Chaudhary Jaewon Kim Dileep Kalathil Srinivas Shakkottai Department of Electrical and Computer Engineering, Texas A&M University {desik,sapanac,jwkim8804,dileep.kalathil,sshakkot}@tamu.edu
Pseudocode Yes Algorithm 1 Enhanced Meta-RL using Demonstrations (EMRLD)
Open Source Code Yes We provide videos of the robot experiments and code at https://github.com/Desik Rengarajan/EMRLD.
Open Datasets Yes We show on standard Mu Jo Co and two-wheeled robot environments that our algorithms work exceptionally well, even when only provided with just one trajectory of sub-optimal demonstration data per task. We train over a small number of tasks that differ in their reward functions. Point2D Navigation is a 2 dimensional goal-reaching environment. Two Wheeled Locomotion environment is a goal-reaching with sparse rewards... Half Cheetah Forward-Backward consists of two tasks...
Dataset Splits Yes Each task i p(T ) is also associated with a data set Di, which is typically divided into training data Dtr i used for task specific adaptation and validation data Dval i used for meta-parameter update.
Hardware Specification No The paper does not specify the exact hardware components (e.g., GPU models, CPU types, or memory specifications) used to run the simulations or train the models. It only mentions the Turtle Bot robot as the subject of real-world experiments.
Software Dependencies No The paper mentions that 'The implemention of our algorithms and baselines is based on a publicly available meta-learning code base [3]', referring to 'learn2learn' and also cites PyTorch. However, it does not provide specific version numbers for these software components or other dependencies required for replication.
Experiment Setup No The paper mentions the use of hyperparameters `wrl` and `wbc` and learning rates `alpha` and `beta` in Algorithm 1, and describes how demonstration data was generated. However, it does not explicitly state the specific numerical values for these hyperparameters, training configurations, or other system-level settings in the main text of the paper. It states 'Further details on state-space and dynamics are provided in the Appendix' for some parts, implying not all setup details are in the main body.