Minimax Regret for Stochastic Shortest Path

Authors: Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work we show that the minimax regret for this setting is e O( p (B2 + B )|S||A|K) where B is a bound on the expected cost of the optimal policy from any state, S is the state space, and A is the action space. This matches the Ω( p B2 |S||A|K) lower bound of Rosenberg et al. [2020] for B 1, and improves their regret bound by a factor of |S|. For B < 1 we prove a matching lower bound of Ω( B |S||A|K). Our algorithm is based on a novel reduction from SSP to finite-horizon MDPs. To that end, we provide an algorithm for the finitehorizon setting whose leading term in the regret depends polynomially on the expected cost of the optimal policy and only logarithmically on the horizon.
Researcher Affiliation Collaboration Alon Cohen Tel-Aviv University and Google Research, Tel Aviv aloncohen@google.com Yonathan Efroni Microsoft Research, New York jonathan.efroni@gmail.com Yishay Mansour Tel-Aviv University and Google Research, Tel Aviv mansour@tau.ac.il Aviv Rosenberg Tel-Aviv University avivros007@gmail.com
Pseudocode Yes Algorithm 1 REDUCTION FROM SSP TO FINITE-HORIZON MDP; Algorithm 2 UPPER LOWER CONFIDENCE VALUE ITERATION (ULCVI); Algorithm 3 OPTIMISTIC-PESSIMISTIC VALUE ITERATION
Open Source Code No The paper does not provide any concrete access information (e.g., repository links, explicit statements of code release) for the methodology described.
Open Datasets No The paper is theoretical and focuses on algorithm design and proofs of regret bounds. It does not conduct empirical experiments using datasets, thus no training dataset information is provided.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments using datasets. Therefore, it does not describe training, validation, or test splits.
Hardware Specification No The paper is theoretical and focuses on algorithm design and proofs. It does not mention any specific hardware used for computation or experiments.
Software Dependencies No The paper is theoretical and presents algorithms and mathematical proofs. It does not mention any specific software or library dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe empirical experiments. Therefore, there are no specific experimental setup details like hyperparameters or training configurations.