Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

Authors: Ran Wei, Nathan Lambert, Anthony D McDonald, Alfredo Garcia, Roberto Calandra

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we provide an in-depth survey of these solution categories and propose a taxonomy to foster future research. In this review, we study existing literature and provide a unifying view of different solutions to the objective mismatch problem. Our main contribution is a taxonomy of four categories of decision-aware MBRL approaches: Distribution Correction, Control-As-Inference, Value-Equivalence, and Differentiable Planning
Researcher Affiliation Collaboration Ran Wei EMAIL VERSES Research Lab Nathan Lambert EMAIL Allen Institute for AI Anthony Mc Donald EMAIL University of Wisconsin-Madison Alfredo Garcia EMAIL Texas A&M University Roberto Calandra EMAIL TU Dresden
Pseudocode Yes Algorithm 1 Basic algorithm of model-based reinforcement learning
Open Source Code No The paper is a survey and proposes a taxonomy; it does not present a new methodology that would typically require source code. The provided link refers to a list of other papers, not the code for the survey itself: "1The full list of papers can be found at https://github.com/ran-weii/objective_mismatch_papers."
Open Datasets No This paper is a survey and does not conduct original experiments using datasets, nor does it release any new datasets. It refers to datasets (e.g., Mu Jo Co, D4RL) that were used in the *reviewed* papers, but not by this paper for its own methodology.
Dataset Splits No This paper is a survey and does not conduct original experiments, thus it does not define or use any dataset splits for training, validation, or testing.
Hardware Specification No This paper is a survey and does not report on experimental results that would require specific hardware specifications for reproduction.
Software Dependencies No This paper is a survey and does not present original computational work requiring specific software dependencies and versions for reproduction.
Experiment Setup No This paper is a survey and does not contain an experimental section or details regarding hyperparameters, training configurations, or system-level settings for its own methodology.