Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning
Authors: Yantian Zha, Lin Guan, Subbarao Kambhampati
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that an RLf D model can be improved by using our SERLf D framework in terms of training stability and performance. |
| Researcher Affiliation | Academia | Arizona State University EMAIL |
| Pseudocode | Yes | Algorithm 1: The SERLf D Learning Algorithm |
| Open Source Code | Yes | To foster further research in self-explanation-guided robot learning, we have made our demonstrations and code publicly accessible at https://github.com/YantianZha/SERLfD. |
| Open Datasets | Yes | To foster further research in self-explanation-guided robot learning, we have made our demonstrations and code publicly accessible at https://github.com/YantianZha/SERLfD. |
| Dataset Splits | No | The paper does not explicitly mention training/validation/test splits or cross-validation methodology. |
| Hardware Specification | No | The paper describes the simulated robot and environment (Fetch Mobile Manipulator in PyBullet simulator) but does not provide specific details about the computing hardware (e.g., CPU, GPU, memory) used for experiments. |
| Software Dependencies | No | The paper mentions software components like 'PyBullet simulator', 'Twin-Delayed DDPG (TD3)', and 'Soft-Actor Critic (SAC)', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Each training episode had a maximum of 50 steps. |