reproducibilityindex.ai

Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning

Authors: Runze Liu, Fengshuo Bai, Yali Du, Yaodong Yang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on robotic simulated manipulation tasks and locomotion tasks demonstrate that MRN outperforms prior methods in the case of few preference labels and signiﬁcantly improves data efﬁciency, achieving state-of-the-art in preference-based RL. Ablation studies further demonstrate that MRN learns a more accurate Q-function compared to prior work and shows obvious advantages when only a small amount of human feedback is available.
Researcher Affiliation	Academia	Runze Liu1,2, Fengshuo Bai3, Yali Du4, , Yaodong Yang1,5, 1Institute for AI, Peking University, 2Shandong University 3Institute of Automation, Chinese Academy of Science 4King s College London, 5Beijing Institute for General AI
Pseudocode	No	The paper provides a high-level framework illustration (Figure 1) and describes the algorithm procedure in text within Section 4.2 and Appendix A, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code	Yes	The source code and videos of this project are released at https://sites.google.com/view/meta-reward-net1.
Open Datasets	Yes	In this section, our method is evaluated on a variety of robotic simulated manipulation tasks from Meta-world [21] and locomotion tasks from Deep Mind Control Suite (DMControl) [22, 23].
Dataset Splits	No	The paper describes the amount of human preference feedback used for different tasks (e.g., '100 for Walker', '10000 for Hammer') and mentions running experiments multiple times. However, it does not provide explicit training, validation, and test splits for the interaction data or trajectories generated by the reinforcement learning agent, nor does it reference standard splits for the environments.
Hardware Specification	Yes	The experiments are run on a single machine with one NVIDIA RTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions using publicly released repositories for baselines (B-Pref [58], SURF [18]) and implementing their method using PEBBLE as the backbone. However, it does not provide specific version numbers for software dependencies or libraries like Python, PyTorch, or other relevant packages.
Experiment Setup	Yes	Details on hyperparameters, network architectures can be found in Appendix E.