Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

IOSTOM: Offline Imitation Learning from Observations via State Transition Occupancy Matching

Authors: Quang Anh Pham, Janaka Brahmanage, Tien Mai, Akshat Kumar

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical evaluations across diverse offline Lf O benchmarks show that IOSTOM substantially outperforms state-of-the-art methods, demonstrating both improved performance and data efficiency. ... In this section, we compare IOSTOM with previous state-of-the-art approaches on diverse sets of environments and tasks from the D4RL benchmark [7], and real world data.
Researcher Affiliation	Academia	Quang Anh Pham, Janaka Chathuranga Brahmanage, Tien Mai, Akshat Kumar Singapore Management University EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 IOSTOM
Open Source Code	Yes	The implementation of IOSTOM is publicly available at https://github.com/quanganh1999/IOSTOM.
Open Datasets	Yes	To answer the question (Q1), we use the same offline Lf O benchmark from DILO [39] with datasets constructed from the D4RL framework [7].
Dataset Splits	Yes	We used 80% of the data for training and reserved the remaining 20% for evaluation.
Hardware Specification	Yes	We conduct our experiments using a computing cluster with 8 NVIDIA RTX 3090 GPUs.
Software Dependencies	Yes	Our method is implemented in JAX version 0.5.3 (with CUDA 12 capabilities).
Experiment Setup	Yes	The regularization β was tuned by searching over [3, 5, 7, 10, 15, 20]. ... We tune τ via via hyper-parameter sweeps over [0.01, 0.04, 0.08, 0.1, 0.2]. ... Learning Rate 3e-4 Weight Decay 1e-3 Training Length 1M steps Batch Size 512 Optimizer Adam Dropout Rate 0.1 LR decay schedule cosine Critic Network Size [256, 256] Activation Function Re LU Learning Rate 3e-4 Training Length 1M steps Batch Size 512 Optimizer Adam Mixture Ratio α 0.5 Polyak Update Rate λ 0.005 Discount Factor γ 0.99