Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

Authors: Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao Jiao

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we leverage the marginalized importance sampling (MIS) formulation of RL and present the first set of offline RL algorithms that are statistically optimal and practical under general function approximation and single-policy concentrability, bypassing the need for uncertainty quantification. We conduct theoretical investigations and design algorithms starting from multi-armed bandits (MABs), going forward to contextual bandits (CBs), and finally Markov decision processes (MDPs).
Researcher Affiliation	Academia	Paria Rashidinejad Hanlin Zhu Kunhe Yang Stuart Russell Jiantao Jiao , Department of Electrical Engineering and Computer Sciences Department of Statistics University of California, Berkeley EMAIL
Pseudocode	Yes	Algorithm 1 ALM with MIS (ALMIS) for offline MAB Algorithm 2 ALM with MIS (ALMIS) for offline CB Algorithm 3 ALM with MIS (ALMIS) for offline RL Model-based Algorithm 4 ALM with MIS (ALMIS) for offline RL Model-free
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	No	The paper mentions using a "previously-collected offline dataset D = {(si, ai, ri, s i)}N i=1" and a dataset D0 = {si}N i=1 for MDPs, but it does not specify any publicly available datasets by name (e.g., CIFAR-10, ImageNet) nor does it provide a link or citation for a specific dataset used in any empirical evaluations.
Dataset Splits	No	The paper focuses on theoretical analysis and algorithm design with proofs; it does not include empirical experiments with explicit dataset splits for training, validation, and testing.
Hardware Specification	No	The paper is theoretical and does not describe any experimental hardware specifications.
Software Dependencies	No	The paper is theoretical and does not list specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and proofs. It does not provide details on experimental setup such as hyperparameters or system-level training settings.