reproducibilityindex.ai

Beating Adversarial Low-Rank MDPs with Unknown Transition and Bandit Feedback

Authors: Haolin Liu, Zak Mhammedi, Chen-Yu Wei, Julian Zimmert

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We consider regret minimization in low-rank MDPs with fixed transition and adversarial losses. Previous work has investigated this problem under either fullinformation loss feedback with unknown transitions (Zhao et al., 2024), or bandit loss feedback with known transition (Foster et al., 2022). First, we improve the poly(d,A,H)T 5/6 regret bound of Zhao et al. (2024) to poly(d,A,H)T 2/3 for the full-information unknown transition setting, where d is the rank of the transitions, A is the number of actions, H is the horizon length, and T is the number of episodes. Next, we initiate the study on the setting with bandit loss feedback and unknown transitions. Assuming that the loss has a linear structure, we propose both model-based and model-free algorithms achieving poly(d,A,H)T 2/3 regret, though they are computationally inefficient. We also propose oracle-efficient modelfree algorithms with poly(d,A,H)T 4/5 regret.
Researcher Affiliation	Collaboration	Haolin Liu University of Virginia srs8rh@virginia.edu Zakaria Mhammedi Google Research mhammedi@google.com Chen-Yu Wei University of Virginia chenyu.wei@virginia.edu Julian Zimmert Google Research zimmert@google.com
Pseudocode	Yes	Algorithm 1 Model-Based Algorithm for Full-Information Feedback
Open Source Code	No	The paper does not contain any explicit statements or links indicating the provision of open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not involve empirical studies with data, therefore, there is no mention of training datasets or their public availability.
Dataset Splits	No	The paper is theoretical and does not involve empirical studies with data, therefore, there is no mention of validation datasets or their splits.
Hardware Specification	No	As a theoretical paper, it does not describe experimental setups or require specific hardware specifications for reproduction.
Software Dependencies	No	As a theoretical paper, it does not detail software dependencies or specific version numbers required for experimental replication.
Experiment Setup	No	As a theoretical paper, it focuses on mathematical analysis and algorithm design rather than empirical evaluation, and thus does not provide details on experimental setup or hyperparameters.