Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MTRec: Learning to Align with User Preferences via Mental Reward Models
Authors: Mengchen Zhao, Yifan Gao, Yaqing Hou, Xiangyang Li, Pengjie Gu, Zhenhua Dong, Ruiming Tang, Yi Cai
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we report the performance of MTRec in both offline and online settings, with focuses on answering the following research questions (RQs). (RQ1:) How does MTRec improve classification-based recommendation models? (RQ2:) How does MTRec improve RL-based recommendation models? (RQ3:) Does the learned mental reward model provide useful information? (RQ4:) How does MTRec perform in online A/B test? ... We conduct extensive offline and online experiments to demonstrate the improvements brought by MTRec. |
| Researcher Affiliation | Collaboration | 1School of Software Engineering, South China University of Technology 2School of Computer Science and Technology, Dalian University of Technology 3Huawei Noah s Ark Lab 4Nanyang Technological University EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 QR-IQL Optimization Steps Input: Interaction (expert) data DE ... Algorithm 2 Overall Implementation of MTRec Input: Interaction (expert) data DE |
| Open Source Code | No | We provide implementation details in Appendix A.3 and will release the code upon acceptance of this paper. |
| Open Datasets | Yes | The Amazon dataset Mc Auley et al. [2015] collects user review data from Amazon e-commerce platform. ... Both training and testing of the algorithms are conducted in simulated interactive recommendation environments on Virtual Taobao Shi et al. [2019]. |
| Dataset Splits | No | The paper mentions using two subsets of the Amazon dataset: Books and Electronics, and describes how user reviews were processed chronologically. It also mentions constructing an expert dataset for Virtual Taobao with 100,000 trajectories. However, it does not provide specific details on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or methodology for creating the splits). |
| Hardware Specification | Yes | Our experiments are run on a server with 2 AMD EPYC 7542 32-Core Processor CPU and 2 NVIDIA RTX 3090 graphics. |
| Software Dependencies | No | The paper mentions using 'Adam Kingma and Ba [2015]' for optimization, which refers to an algorithm rather than a specific software library with a version number. No other software libraries or tools are specified with version numbers. |
| Experiment Setup | Yes | All the hyper-parameters of the backbone models follow their official codes. For implementation of MTRec, we select the number of quantiles N = 10 and the weight α = 0.5 in Problem 6. |