Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

Authors: Ran Wei, Nathan Lambert, Anthony D McDonald, Alfredo Garcia, Roberto Calandra

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we provide an in-depth survey of these solution categories and propose a taxonomy to foster future research. In this review, we study existing literature and provide a unifying view of different solutions to the objective mismatch problem. Our main contribution is a taxonomy of four categories of decision-aware MBRL approaches: Distribution Correction, Control-As-Inference, Value-Equivalence, and Differentiable Planning
Researcher Affiliation Collaboration Ran Wei EMAIL VERSES Research Lab Nathan Lambert EMAIL Allen Institute for AI Anthony Mc Donald EMAIL University of Wisconsin-Madison Alfredo Garcia EMAIL Texas A&M University Roberto Calandra EMAIL TU Dresden
Pseudocode Yes Algorithm 1 Basic algorithm of model-based reinforcement learning
Open Source Code No The paper is a survey and proposes a taxonomy; it does not present a new methodology that would typically require source code. The provided link refers to a list of other papers, not the code for the survey itself: "1The full list of papers can be found at https://github.com/ran-weii/objective_mismatch_papers."
Open Datasets No This paper is a survey and does not conduct original experiments using datasets, nor does it release any new datasets. It refers to datasets (e.g., Mu Jo Co, D4RL) that were used in the *reviewed* papers, but not by this paper for its own methodology.
Dataset Splits No This paper is a survey and does not conduct original experiments, thus it does not define or use any dataset splits for training, validation, or testing.
Hardware Specification No This paper is a survey and does not report on experimental results that would require specific hardware specifications for reproduction.
Software Dependencies No This paper is a survey and does not present original computational work requiring specific software dependencies and versions for reproduction.
Experiment Setup No This paper is a survey and does not contain an experimental section or details regarding hyperparameters, training configurations, or system-level settings for its own methodology.