Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
PN-GAIL: Leveraging Non-optimal Information from Imperfect Demonstrations
Authors: Qiang Liu, Huiqiao Fu, Kaiqiang Tang, Chunlin Chen, Daoyi Dong
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that PN-GAIL surpasses conventional baseline methods in dealing with imperfect demonstrations, thereby significantly augmenting the practical utility of imitation learning in real-world contexts. Our codes are available at https://github.com/Qiang Liu T/PN-GAIL. Experiments on six control tasks are conducted to show the efficiency of our method in dealing with imperfect demonstrations compared to baseline methods. |
| Researcher Affiliation | Academia | Qiang Liu, Huiqiao Fu, Kaiqiang Tang & Chunlin Chen School of Management and Engineering Nanjing University Nanjing, China EMAIL, EMAIL Daoyi Dong The Australian Artificial Intelligence Institute University of Technology Sydney Sydney, Australia EMAIL |
| Pseudocode | Yes | The pseudocode for the overall algorithm can be found in Appendix A. Algorithm 1 PN-GAIL |
| Open Source Code | Yes | Our codes are available at https://github.com/Qiang Liu T/PN-GAIL. |
| Open Datasets | Yes | Task setup We conduct experiments across six environments (Pendulum-v1, Ant-v2, Walker2d-v2, Hopper-v2, Swimmer-v2, and Half Cheetah-v2). ... For the Ant-v2, Walker2d-v2, Hopper-v2, Swimmer-v2, and Half Cheetah-v2 environments, to maintain fairness, we directly utilize the demonstrations and confidence scores provided by the code of 2IWIL. |
| Dataset Splits | Yes | During the practical experiments across all six environments, 20% of the given demonstrations are randomly selected to be assigned confidence scores, which means that the label ratio is 0.2. ... In our experiments, we use different numbers of Dc + Du for different tasks, and the specific values are shown in Appendix C.1. Table 3 shows the number of confidence data and unlabeled data used for each task... |
| Hardware Specification | Yes | All of our experiments are run on a single machine with 4 NVIDIA Ge Force RTX 3080 GPUs. |
| Software Dependencies | No | The paper mentions TRPO, PPO, SAC as RL methods and Adam as an optimizer, but does not provide specific software library versions (e.g., Python, PyTorch versions) for reproducibility. |
| Experiment Setup | Yes | Table 2: Hyper-parameters settings. Hyper-parameters value. γ 0.995. τ (Generalized Advantage Estimation) 0.97. Batch size 5, 000. Learning rate (Value network) 3e-4. Learning rate (Discriminator) 1e-3. Learning rate (Classifier) 3e-4. Optimizer Adam. |