Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

USN: A Robust Imitation Learning Method against Diverse Action Noise

Authors: Xingrui Yu, Bo Han, Ivor W. Tsang

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results in Box2D tasks and Atari games show that USN consistently improves the final rewards of behavioral cloning, online imitation learning, and offline imitation learning methods under various action noises.
Researcher Affiliation	Academia	Xingrui Yu EMAIL Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore Institute of High-Performance Computing, Agency for Science, Technology and Research, Singapore Bo Han EMAIL Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR Ivor W. Tsang EMAIL Centre for Frontier AI Research, Agency for Science, Technology and Research, Singapore Institute of High-Performance Computing, Agency for Science, Technology and Research, Singapore School of Computer Science and Engineering, Nanyang Technological University, Singapore
Pseudocode	Yes	Algorithm 1 State-Independent Action Noise Generation, Algorithm 2 State-Dependent Action Noise Generation, Algorithm 3 BCQ with ICM for offline imitation learning., Algorithm 4 Uncertainty-aware Sample-selection with Soft Negative learning (USN), Algorithm 5 BCQ_ICM-USN for robust offline imitation learning against action noise.
Open Source Code	No	The text discusses the source code of a third-party tool or platform that the authors used (e.g., stablebaselines3, gail_atari, tianshou), but does not provide their own implementation code for the methodology described in this paper.
Open Datasets	Yes	We conduct experiments on the classic control task from Open AI Gym (Brockman et al., 2016) Lunar Lander-v2. We use widely used Atari games simulated through Arcade Learning Environment (Bellemare et al., 2013). In the experiments on real-world benchmark (Section 6), we use the CARLA dataset6 to train our models. 6. https://github.com/carla-simulator/imitation-learning
Dataset Splits	Yes	We pre-train a DQN policy as the expert agent and generate noisy demonstrations with 50K steps. We generate one full-episode demonstration using pre-trained PPO agents. Then, we generate 50K-step demonstration datasets with synthetic state-independent and state-dependent action noises. we sample Dl from the original CARLA dataset with a portion of ρ (0, 0.5). The rest (1 ρ) portion of the CARLA dataset is considered as the Du by removing the commands.
Hardware Specification	No	The computational work for this article was partially performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg). This statement refers to a general computing resource without specifying exact hardware components like GPU or CPU models.
Software Dependencies	No	No specific version numbers for software dependencies (libraries, frameworks, etc.) are provided. The paper mentions using open-source implementations and platforms (stablebaselines3, gail_atari, tianshou), but without version details for these or other core libraries.
Experiment Setup	Yes	We build the BC model using a 2-layer MLP architecture with 32 neurons on each layer. ... We use a 3-layer MLP architecture with 32 units in each hidden layer as the model backbone for implementing BC, BC-GCE, and BC-USN. We train all the models using the Adam optimizer for 20 epochs. ... We set λneg = 1.0 for BC-USN. The PPO is trained with a learning rate of 2.5e-4, a clipping threshold of 0.1, an entropy coefficient of 0.01, a value function coefficient of 0.5, and a GAE parameter of 0.95 (Schulman et al., 2016). The command-correction module and action module contain three fully connected layers with 256 units each, followed by Re LU nonlinearities and dropout operation. We train CIL-USN and the baselines on the noisy demonstration dataset D using the Adam optimizer with an initial learning rate of 0.0002.