reproducibilityindex.ai

Minimax Regret for Stochastic Shortest Path

Authors: Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work we show that the minimax regret for this setting is e O( p (B2 + B )\|S\|\|A\|K) where B is a bound on the expected cost of the optimal policy from any state, S is the state space, and A is the action space. This matches the Ω( p B2 \|S\|\|A\|K) lower bound of Rosenberg et al. [2020] for B 1, and improves their regret bound by a factor of \|S\|. For B < 1 we prove a matching lower bound of Ω( B \|S\|\|A\|K). Our algorithm is based on a novel reduction from SSP to ﬁnite-horizon MDPs. To that end, we provide an algorithm for the ﬁnitehorizon setting whose leading term in the regret depends polynomially on the expected cost of the optimal policy and only logarithmically on the horizon.
Researcher Affiliation	Collaboration	Alon Cohen Tel-Aviv University and Google Research, Tel Aviv aloncohen@google.com Yonathan Efroni Microsoft Research, New York jonathan.efroni@gmail.com Yishay Mansour Tel-Aviv University and Google Research, Tel Aviv mansour@tau.ac.il Aviv Rosenberg Tel-Aviv University avivros007@gmail.com
Pseudocode	Yes	Algorithm 1 REDUCTION FROM SSP TO FINITE-HORIZON MDP; Algorithm 2 UPPER LOWER CONFIDENCE VALUE ITERATION (ULCVI); Algorithm 3 OPTIMISTIC-PESSIMISTIC VALUE ITERATION
Open Source Code	No	The paper does not provide any concrete access information (e.g., repository links, explicit statements of code release) for the methodology described.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and proofs of regret bounds. It does not conduct empirical experiments using datasets, thus no training dataset information is provided.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments using datasets. Therefore, it does not describe training, validation, or test splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and proofs. It does not mention any specific hardware used for computation or experiments.
Software Dependencies	No	The paper is theoretical and presents algorithms and mathematical proofs. It does not mention any specific software or library dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not describe empirical experiments. Therefore, there are no specific experimental setup details like hyperparameters or training configurations.