Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

IOSTOM: Offline Imitation Learning from Observations via State Transition Occupancy Matching

Authors: Quang Anh Pham, Janaka Brahmanage, Tien Mai, Akshat Kumar

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical evaluations across diverse offline Lf O benchmarks show that IOSTOM substantially outperforms state-of-the-art methods, demonstrating both improved performance and data efficiency. ... In this section, we compare IOSTOM with previous state-of-the-art approaches on diverse sets of environments and tasks from the D4RL benchmark [7], and real world data.
Researcher Affiliation Academia Quang Anh Pham, Janaka Chathuranga Brahmanage, Tien Mai, Akshat Kumar Singapore Management University EMAIL EMAIL
Pseudocode Yes Algorithm 1 IOSTOM
Open Source Code Yes The implementation of IOSTOM is publicly available at https://github.com/quanganh1999/IOSTOM.
Open Datasets Yes To answer the question (Q1), we use the same offline Lf O benchmark from DILO [39] with datasets constructed from the D4RL framework [7].
Dataset Splits Yes We used 80% of the data for training and reserved the remaining 20% for evaluation.
Hardware Specification Yes We conduct our experiments using a computing cluster with 8 NVIDIA RTX 3090 GPUs.
Software Dependencies Yes Our method is implemented in JAX version 0.5.3 (with CUDA 12 capabilities).
Experiment Setup Yes The regularization β was tuned by searching over [3, 5, 7, 10, 15, 20]. ... We tune τ via via hyper-parameter sweeps over [0.01, 0.04, 0.08, 0.1, 0.2]. ... Learning Rate 3e-4 Weight Decay 1e-3 Training Length 1M steps Batch Size 512 Optimizer Adam Dropout Rate 0.1 LR decay schedule cosine Critic Network Size [256, 256] Activation Function Re LU Learning Rate 3e-4 Training Length 1M steps Batch Size 512 Optimizer Adam Mixture Ratio α 0.5 Polyak Update Rate λ 0.005 Discount Factor γ 0.99