Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning
Authors: Chia-Cheng Chiang, Li-Cheng Lan, Wei-Fang Sun, Chien Feng, Cho-Jui Hsieh, Chun-Yi Lee
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the efficacy of TDIL, we perform comprehensive experiments on five widely adopted Mu Jo Co benchmarks (Todorov et al., 2012), aligning with most prior IL research, as well as the Adroit Door environment (Rajeswaran et al., 2017) in the Gymnasium-Robotics collection (de Lazcano et al., 2023). The experimental evidence reveals that TDIL delivers exceptional performance, matches expert-level results on these benchmarks, and outperforms existing IL approaches. |
| Researcher Affiliation | Collaboration | 1ELSA Lab, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan. 2Department of Computer Science, University of California, Los Angeles, CA, USA. 3NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, CA, USA. |
| Pseudocode | Yes | Algorithm 1 presents a practical training methodology of the proposed method, refering to TDIL. |
| Open Source Code | Yes | The code implementation and expert data used in this work are available on this Git Hub repository. |
| Open Datasets | Yes | To validate the efficacy of TDIL, we perform comprehensive experiments on five widely adopted Mu Jo Co benchmarks (Todorov et al., 2012), aligning with most prior IL research, as well as the Adroit Door environment (Rajeswaran et al., 2017) in the Gymnasium-Robotics collection (de Lazcano et al., 2023). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with specific percentages or counts. It mentions training over 3M timesteps and using five random seeds, as well as discussing blind model selection. |
| Hardware Specification | Yes | Table A1. The hardware specification used to perform our experiments. CPU AMD Ryzen Threadripper 3990X 64-Core Processor GPU NVIDIA GeForce RTX 3090 |
| Software Dependencies | No | The paper mentions using the Soft Actor-Critic (SAC) framework and MuJoCo, but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | The actor and critic networks in SAC are implemented as neural networks with three hidden layers and rectified linear unit (ReLU) activation functions. Each of these hidden layers consists of 256 nodes. All algorithms are trained over 3M timesteps, each employing five random seeds. For the aggregate reward Ragg version, we select β = 0.9 based on our grid search, with the results presented in Table A7. |