Minimax Regret for Stochastic Shortest Path
Authors: Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work we show that the minimax regret for this setting is e O( p (B2 + B )|S||A|K) where B is a bound on the expected cost of the optimal policy from any state, S is the state space, and A is the action space. This matches the Ω( p B2 |S||A|K) lower bound of Rosenberg et al. [2020] for B 1, and improves their regret bound by a factor of |S|. For B < 1 we prove a matching lower bound of Ω( B |S||A|K). Our algorithm is based on a novel reduction from SSP to finite-horizon MDPs. To that end, we provide an algorithm for the finitehorizon setting whose leading term in the regret depends polynomially on the expected cost of the optimal policy and only logarithmically on the horizon. |
| Researcher Affiliation | Collaboration | Alon Cohen Tel-Aviv University and Google Research, Tel Aviv aloncohen@google.com Yonathan Efroni Microsoft Research, New York jonathan.efroni@gmail.com Yishay Mansour Tel-Aviv University and Google Research, Tel Aviv mansour@tau.ac.il Aviv Rosenberg Tel-Aviv University avivros007@gmail.com |
| Pseudocode | Yes | Algorithm 1 REDUCTION FROM SSP TO FINITE-HORIZON MDP; Algorithm 2 UPPER LOWER CONFIDENCE VALUE ITERATION (ULCVI); Algorithm 3 OPTIMISTIC-PESSIMISTIC VALUE ITERATION |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository links, explicit statements of code release) for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and proofs of regret bounds. It does not conduct empirical experiments using datasets, thus no training dataset information is provided. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments using datasets. Therefore, it does not describe training, validation, or test splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and proofs. It does not mention any specific hardware used for computation or experiments. |
| Software Dependencies | No | The paper is theoretical and presents algorithms and mathematical proofs. It does not mention any specific software or library dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments. Therefore, there are no specific experimental setup details like hyperparameters or training configurations. |