reproducibilityindex.ai

Enhanced Meta Reinforcement Learning via Demonstrations in Sparse Reward Environments

Authors: Desik Rengarajan, Sapana Chaudhary, Jaewon Kim, Dileep Kalathil, Srinivas Shakkottai

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we show that our EMRLD algorithms signiﬁcantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot. 4 Experimental Evaluation
Researcher Affiliation	Academia	Desik Rengarajan Sapana Chaudhary Jaewon Kim Dileep Kalathil Srinivas Shakkottai Department of Electrical and Computer Engineering, Texas A&M University {desik,sapanac,jwkim8804,dileep.kalathil,sshakkot}@tamu.edu
Pseudocode	Yes	Algorithm 1 Enhanced Meta-RL using Demonstrations (EMRLD)
Open Source Code	Yes	We provide videos of the robot experiments and code at https://github.com/Desik Rengarajan/EMRLD.
Open Datasets	Yes	We show on standard Mu Jo Co and two-wheeled robot environments that our algorithms work exceptionally well, even when only provided with just one trajectory of sub-optimal demonstration data per task. We train over a small number of tasks that differ in their reward functions. Point2D Navigation is a 2 dimensional goal-reaching environment. Two Wheeled Locomotion environment is a goal-reaching with sparse rewards... Half Cheetah Forward-Backward consists of two tasks...
Dataset Splits	Yes	Each task i p(T ) is also associated with a data set Di, which is typically divided into training data Dtr i used for task speciﬁc adaptation and validation data Dval i used for meta-parameter update.
Hardware Specification	No	The paper does not specify the exact hardware components (e.g., GPU models, CPU types, or memory specifications) used to run the simulations or train the models. It only mentions the Turtle Bot robot as the subject of real-world experiments.
Software Dependencies	No	The paper mentions that 'The implemention of our algorithms and baselines is based on a publicly available meta-learning code base [3]', referring to 'learn2learn' and also cites PyTorch. However, it does not provide specific version numbers for these software components or other dependencies required for replication.
Experiment Setup	No	The paper mentions the use of hyperparameters `wrl` and `wbc` and learning rates `alpha` and `beta` in Algorithm 1, and describes how demonstration data was generated. However, it does not explicitly state the specific numerical values for these hyperparameters, training configurations, or other system-level settings in the main text of the paper. It states 'Further details on state-space and dynamics are provided in the Appendix' for some parts, implying not all setup details are in the main body.