Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-objective Linear Reinforcement Learning with Lexicographic Rewards
Authors: Bo Xue, Dake Bu, Ji Cheng, Yuanyu Wan, Qingfu Zhang
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | To bridge this gap, we examine MORL under lexicographic reward structures... We introduce the ๏ฌrst MORL algorithm with provable regret guarantees... Our work provides a comprehensive theoretical analysis of regret bounds... This work establishes the ๏ฌrst theoretical regret guarantee for MORL. |
| Researcher Affiliation | Academia | 1Department of Computer Science, City University of Hong Kong, Hong Kong, China 2School of Software Technology, Zhejiang University, Ningbo, China. Correspondence to: Qingfu Zhang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Lexicographic Linear Reinforcement Learning Algorithm 2 Lexicographic Action Elimination |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that open-source code for the described methodology is available. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and regret analysis. It does not conduct empirical studies using specific datasets that would be made publicly available. |
| Dataset Splits | No | The paper is theoretical and does not involve experiments on datasets, therefore no dataset splits are provided. |
| Hardware Specification | No | The paper describes a theoretical algorithm and provides proofs; it does not include any experimental evaluation requiring specific hardware, so no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not present experimental results, therefore it does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm development and regret analysis. It does not describe an experimental setup with hyperparameters or training configurations. |