Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents

Authors: Junyan Liu, Lillian J. Ratliff

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper studies the repeated principal-agent bandit game, where the principal indirectly explores an unknown environment by incentivizing an agent to play arms. We propose algorithms for both i.i.d. and linear reward settings with bandit feedback in a finite horizon T, achieving regret bounds of e O(T) and e O(T 2/3), respectively.
Researcher Affiliation Academia 1Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA USA 2Electrical & Computer Engineering, University of Washington, Seattle, WA USA.
Pseudocode Yes Algorithm 1 Proposed algorithm for i.i.d. reward
Open Source Code No The paper does not contain any statement about code availability or links to code repositories.
Open Datasets No The paper describes a theoretical framework for principal-agent bandit games and does not use any specific dataset for empirical evaluation.
Dataset Splits No The paper focuses on theoretical algorithm design and regret bounds for bandit games, and does not involve empirical evaluation on datasets, thus no dataset splits are provided.
Hardware Specification No The paper presents theoretical algorithms and regret analysis for bandit games, without conducting empirical experiments that would require specific hardware specifications.
Software Dependencies No The paper focuses on theoretical algorithm design and provides mathematical analysis, without detailing specific software or library versions used for implementation or simulation.
Experiment Setup No The paper presents a theoretical study of principal-agent bandit games, focusing on algorithm design and regret bounds, and therefore does not describe any experimental setup or hyperparameters.