Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
IOSTOM: Offline Imitation Learning from Observations via State Transition Occupancy Matching
Authors: Quang Anh Pham, Janaka Brahmanage, Tien Mai, Akshat Kumar
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical evaluations across diverse offline Lf O benchmarks show that IOSTOM substantially outperforms state-of-the-art methods, demonstrating both improved performance and data efficiency. ... In this section, we compare IOSTOM with previous state-of-the-art approaches on diverse sets of environments and tasks from the D4RL benchmark [7], and real world data. |
| Researcher Affiliation | Academia | Quang Anh Pham, Janaka Chathuranga Brahmanage, Tien Mai, Akshat Kumar Singapore Management University EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 IOSTOM |
| Open Source Code | Yes | The implementation of IOSTOM is publicly available at https://github.com/quanganh1999/IOSTOM. |
| Open Datasets | Yes | To answer the question (Q1), we use the same offline Lf O benchmark from DILO [39] with datasets constructed from the D4RL framework [7]. |
| Dataset Splits | Yes | We used 80% of the data for training and reserved the remaining 20% for evaluation. |
| Hardware Specification | Yes | We conduct our experiments using a computing cluster with 8 NVIDIA RTX 3090 GPUs. |
| Software Dependencies | Yes | Our method is implemented in JAX version 0.5.3 (with CUDA 12 capabilities). |
| Experiment Setup | Yes | The regularization β was tuned by searching over [3, 5, 7, 10, 15, 20]. ... We tune τ via via hyper-parameter sweeps over [0.01, 0.04, 0.08, 0.1, 0.2]. ... Learning Rate 3e-4 Weight Decay 1e-3 Training Length 1M steps Batch Size 512 Optimizer Adam Dropout Rate 0.1 LR decay schedule cosine Critic Network Size [256, 256] Activation Function Re LU Learning Rate 3e-4 Training Length 1M steps Batch Size 512 Optimizer Adam Mixture Ratio α 0.5 Polyak Update Rate λ 0.005 Discount Factor γ 0.99 |