Near-optimal Regret Bounds for Stochastic Shortest Path
Authors: Aviv Rosenberg, Alon Cohen, Yishay Mansour, Haim Kaplan
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our technical contribution is as follows. To better explain our main Bernstein-based algorithm, we start by assuming that the costs are lower bounded by cmin and give an algorithm based on Hoeffding inequalities that is simple to analyze and achieves a regret bound of e O(B3/2 |S| p|A|K/cmin). Note that this bound is comparable to the one of Tarbouriech et al. (2020), yet our algorithm and its analysis are significantly simpler and more intuitive. In addition, its analysis contains many of the key ideas of the proof of the Bernsteinbased algorithm, and is much easier to follow. We subsequently present the Bernstein-based algorithm. This algorithm is simpler than our first one mainly since picking the parameters of the optimistic model is particularly easy. The analysis, however, is somewhat more delicate (as mentioned earlier) in the main body of the paper, we present only the main ideas and improvements over the Hoeffding-based algorithm, and differ the tedious technical details to the supplementary material. Eventually, we achieve our final regret bound by perturbing the instantaneous costs to be at least ϵ > 0. The additional cost due to this perturbation has a small effect since the dependency of our regret on c 1 min is additive and does not multiply any term depending on K. |
| Researcher Affiliation | Collaboration | 1 Google Research, Tel Aviv 2Tel Aviv University, Israel. Correspondence to: Alon Cohen <aloncohen@google.com>, Aviv Rosenberg <avivros007@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 HOEFFDING-TYPE CONFIDENCE BOUNDS AND KNOWN B Algorithm 2 BERNSTEIN-TYPE CONFIDENCE BOUNDS |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that code is available. |
| Open Datasets | No | The paper is theoretical and focuses on regret bounds and algorithms, not on empirical training with datasets. There is no mention of any specific datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and analysis. It does not describe any specific experimental setup details like hyperparameters or system-level training settings. |