Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning
Authors: Ran Wei, Nathan Lambert, Anthony D McDonald, Alfredo Garcia, Roberto Calandra
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we provide an in-depth survey of these solution categories and propose a taxonomy to foster future research. In this review, we study existing literature and provide a unifying view of different solutions to the objective mismatch problem. Our main contribution is a taxonomy of four categories of decision-aware MBRL approaches: Distribution Correction, Control-As-Inference, Value-Equivalence, and Differentiable Planning |
| Researcher Affiliation | Collaboration | Ran Wei EMAIL VERSES Research Lab Nathan Lambert EMAIL Allen Institute for AI Anthony Mc Donald EMAIL University of Wisconsin-Madison Alfredo Garcia EMAIL Texas A&M University Roberto Calandra EMAIL TU Dresden |
| Pseudocode | Yes | Algorithm 1 Basic algorithm of model-based reinforcement learning |
| Open Source Code | No | The paper is a survey and proposes a taxonomy; it does not present a new methodology that would typically require source code. The provided link refers to a list of other papers, not the code for the survey itself: "1The full list of papers can be found at https://github.com/ran-weii/objective_mismatch_papers." |
| Open Datasets | No | This paper is a survey and does not conduct original experiments using datasets, nor does it release any new datasets. It refers to datasets (e.g., Mu Jo Co, D4RL) that were used in the *reviewed* papers, but not by this paper for its own methodology. |
| Dataset Splits | No | This paper is a survey and does not conduct original experiments, thus it does not define or use any dataset splits for training, validation, or testing. |
| Hardware Specification | No | This paper is a survey and does not report on experimental results that would require specific hardware specifications for reproduction. |
| Software Dependencies | No | This paper is a survey and does not present original computational work requiring specific software dependencies and versions for reproduction. |
| Experiment Setup | No | This paper is a survey and does not contain an experimental section or details regarding hyperparameters, training configurations, or system-level settings for its own methodology. |