Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning
Authors: Ran Wei, Nathan Lambert, Anthony D McDonald, Alfredo Garcia, Roberto Calandra
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we provide an in-depth survey of these solution categories and propose a taxonomy to foster future research. In this review, we study existing literature and provide a unifying view of different solutions to the objective mismatch problem. Our main contribution is a taxonomy of four categories of decision-aware MBRL approaches: Distribution Correction, Control-As-Inference, Value-Equivalence, and Differentiable Planning |
| Researcher Affiliation | Collaboration | Ran Wei EMAIL VERSES Research Lab Nathan Lambert EMAIL Allen Institute for AI Anthony Mc Donald EMAIL University of Wisconsin-Madison Alfredo Garcia EMAIL Texas A&M University Roberto Calandra EMAIL TU Dresden |
| Pseudocode | Yes | Algorithm 1 Basic algorithm of model-based reinforcement learning |
| Open Source Code | No | The paper is a survey and proposes a taxonomy; it does not present a new methodology that would typically require source code. The provided link refers to a list of other papers, not the code for the survey itself: "1The full list of papers can be found at https://github.com/ran-weii/objective_mismatch_papers." |
| Open Datasets | No | This paper is a survey and does not conduct original experiments using datasets, nor does it release any new datasets. It refers to datasets (e.g., Mu Jo Co, D4RL) that were used in the *reviewed* papers, but not by this paper for its own methodology. |
| Dataset Splits | No | This paper is a survey and does not conduct original experiments, thus it does not define or use any dataset splits for training, validation, or testing. |
| Hardware Specification | No | This paper is a survey and does not report on experimental results that would require specific hardware specifications for reproduction. |
| Software Dependencies | No | This paper is a survey and does not present original computational work requiring specific software dependencies and versions for reproduction. |
| Experiment Setup | No | This paper is a survey and does not contain an experimental section or details regarding hyperparameters, training configurations, or system-level settings for its own methodology. |