Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Imitation Learning from Observation with Automatic Discount Scheduling
Authors: Yuyang Liu, Weijun Dong, Yingdong Hu, Chuan Wen, Zhao-Heng Yin, Chongjie Zhang, Yang Gao
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments, conducted on nine Meta-World tasks, demonstrate that our method significantly outperforms stateof-the-art methods across all tasks, including those that are unsolvable by them. |
| Researcher Affiliation | Academia | Yuyang Liu1,2, , Weijun Dong1,2, , Yingdong Hu1,2, Chuan Wen1,2, Zhao-Heng Yin3, Chongjie Zhang4, Yang Gao1,2,5, 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2Shanghai Qi Zhi Institute 3UC Berkeley 4Washington University in St. Louis 5Shanghai Artificial Intelligence Laboratory |
| Pseudocode | Yes | Algorithm 1 Imitation Learning from Observation with Automatic Discount Scheduling |
| Open Source Code | Yes | Our code is available at https://il-ads.github.io/. With the code released online and the hyperparameter settings in Appendix A.1, the experiment results are highly reproducible. |
| Open Datasets | Yes | We experiment with 9 challenging tasks from the Meta-World (Yu et al., 2020) suite. Instead, the agent is equipped with 10 expert demonstration sequences, which solely comprise observational data. |
| Dataset Splits | No | The paper describes reinforcement learning experiments with agents interacting in an environment, and does not specify traditional training/validation/test dataset splits like those found in supervised learning tasks. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used to run its experiments. |
| Software Dependencies | No | The paper mentions software like Dr Q-v2, Adam optimizer, and Res Net-50 but does not provide specific version numbers for these components, which are required for reproducible software dependencies. |
| Experiment Setup | Yes | The hyperparameters are listed in Table 1. Replay buffer capacity 150000 n-step returns 3 Mini-batch size 512 Discount γ (for baselines) 0.99 Optimizer Adam Learning rate 10 4 Critic Q-function soft-update rate τ 0.005 Hidden dimension 1024 Exploration noise N(0, 0.4) Policy noise clip(N(0, 0.1), 0.3, 0.3) Delayed policy update 1 λ (for progress recognizer Φ) 0.9 α (for mapping function fγ) 0.2 |