Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
USN: A Robust Imitation Learning Method against Diverse Action Noise
Authors: Xingrui Yu, Bo Han, Ivor W. Tsang
JAIR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results in Box2D tasks and Atari games show that USN consistently improves the final rewards of behavioral cloning, online imitation learning, and offline imitation learning methods under various action noises. |
| Researcher Affiliation | Academia | Xingrui Yu EMAIL Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore Institute of High-Performance Computing, Agency for Science, Technology and Research, Singapore Bo Han EMAIL Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR Ivor W. Tsang EMAIL Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore Institute of High-Performance Computing, Agency for Science, Technology and Research, Singapore School of Computer Science and Engineering, Nanyang Technological University, Singapore |
| Pseudocode | Yes | Algorithm 1 State-Independent Action Noise Generation, Algorithm 2 State-Dependent Action Noise Generation, Algorithm 3 BCQ with ICM for offline imitation learning., Algorithm 4 Uncertainty-aware Sample-selection with Soft Negative learning (USN), Algorithm 5 BCQ_ICM-USN for robust offline imitation learning against action noise. |
| Open Source Code | No | The text discusses the source code of a third-party tool or platform that the authors used (e.g., stablebaselines3, gail_atari, tianshou), but does not provide their own implementation code for the methodology described in this paper. |
| Open Datasets | Yes | We conduct experiments on the classic control task from Open AI Gym (Brockman et al., 2016) Lunar Lander-v2. We use widely used Atari games simulated through Arcade Learning Environment (Bellemare et al., 2013). In the experiments on real-world benchmark (Section 6), we use the CARLA dataset6 to train our models. 6. https://github.com/carla-simulator/imitation-learning |
| Dataset Splits | Yes | We pre-train a DQN policy as the expert agent and generate noisy demonstrations with 50K steps. We generate one full-episode demonstration using pre-trained PPO agents. Then, we generate 50K-step demonstration datasets with synthetic state-independent and state-dependent action noises. we sample Dl from the original CARLA dataset with a portion of Ο (0, 0.5). The rest (1 Ο) portion of the CARLA dataset is considered as the Du by removing the commands. |
| Hardware Specification | No | The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg). This statement refers to a general computing resource without specifying exact hardware components like GPU or CPU models. |
| Software Dependencies | No | No specific version numbers for software dependencies (libraries, frameworks, etc.) are provided. The paper mentions using open-source implementations and platforms (stablebaselines3, gail_atari, tianshou), but without version details for these or other core libraries. |
| Experiment Setup | Yes | We build the BC model using a 2-layer MLP architecture with 32 neurons on each layer. ... We use a 3-layer MLP architecture with 32 units in each hidden layer as the model backbone for implementing BC, BC-GCE, and BC-USN. We train all the models using the Adam optimizer for 20 epochs. ... We set Ξ»neg = 1.0 for BC-USN. The PPO is trained with a learning rate of 2.5e-4, a clipping threshold of 0.1, an entropy coefficient of 0.01, a value function coefficient of 0.5, and a GAE parameter of 0.95 (Schulman et al., 2016). The command-correction module and action module contain three fully connected layers with 256 units each, followed by Re LU nonlinearities and dropout operation. We train CIL-USN and the baselines on the noisy demonstration dataset D using the Adam optimizer with an initial learning rate of 0.0002. |