Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-objective Linear Reinforcement Learning with Lexicographic Rewards

Authors: Bo Xue, Dake Bu, Ji Cheng, Yuanyu Wan, Qingfu Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical To bridge this gap, we examine MORL under lexicographic reward structures... We introduce the ๏ฌrst MORL algorithm with provable regret guarantees... Our work provides a comprehensive theoretical analysis of regret bounds... This work establishes the ๏ฌrst theoretical regret guarantee for MORL.
Researcher Affiliation Academia 1Department of Computer Science, City University of Hong Kong, Hong Kong, China 2School of Software Technology, Zhejiang University, Ningbo, China. Correspondence to: Qingfu Zhang <EMAIL>.
Pseudocode Yes Algorithm 1 Lexicographic Linear Reinforcement Learning Algorithm 2 Lexicographic Action Elimination
Open Source Code No The paper does not contain any explicit statements or links indicating that open-source code for the described methodology is available.
Open Datasets No The paper is theoretical and focuses on algorithm design and regret analysis. It does not conduct empirical studies using specific datasets that would be made publicly available.
Dataset Splits No The paper is theoretical and does not involve experiments on datasets, therefore no dataset splits are provided.
Hardware Specification No The paper describes a theoretical algorithm and provides proofs; it does not include any experimental evaluation requiring specific hardware, so no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not present experimental results, therefore it does not specify any software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithm development and regret analysis. It does not describe an experimental setup with hyperparameters or training configurations.