reproducibilityindex.ai

Near-optimal Regret Bounds for Stochastic Shortest Path

Authors: Aviv Rosenberg, Alon Cohen, Yishay Mansour, Haim Kaplan

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our technical contribution is as follows. To better explain our main Bernstein-based algorithm, we start by assuming that the costs are lower bounded by cmin and give an algorithm based on Hoeffding inequalities that is simple to analyze and achieves a regret bound of e O(B3/2 \|S\| p\|A\|K/cmin). Note that this bound is comparable to the one of Tarbouriech et al. (2020), yet our algorithm and its analysis are signiﬁcantly simpler and more intuitive. In addition, its analysis contains many of the key ideas of the proof of the Bernsteinbased algorithm, and is much easier to follow. We subsequently present the Bernstein-based algorithm. This algorithm is simpler than our ﬁrst one mainly since picking the parameters of the optimistic model is particularly easy. The analysis, however, is somewhat more delicate (as mentioned earlier) in the main body of the paper, we present only the main ideas and improvements over the Hoeffding-based algorithm, and differ the tedious technical details to the supplementary material. Eventually, we achieve our ﬁnal regret bound by perturbing the instantaneous costs to be at least ϵ > 0. The additional cost due to this perturbation has a small effect since the dependency of our regret on c 1 min is additive and does not multiply any term depending on K.
Researcher Affiliation	Collaboration	1 Google Research, Tel Aviv 2Tel Aviv University, Israel. Correspondence to: Alon Cohen <aloncohen@google.com>, Aviv Rosenberg <avivros007@gmail.com>.
Pseudocode	Yes	Algorithm 1 HOEFFDING-TYPE CONFIDENCE BOUNDS AND KNOWN B Algorithm 2 BERNSTEIN-TYPE CONFIDENCE BOUNDS
Open Source Code	No	The paper does not provide any links to open-source code or explicitly state that code is available.
Open Datasets	No	The paper is theoretical and focuses on regret bounds and algorithms, not on empirical training with datasets. There is no mention of any specific datasets.
Dataset Splits	No	The paper is theoretical and does not involve dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not list any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and analysis. It does not describe any specific experimental setup details like hyperparameters or system-level training settings.