reproducibilityindex.ai

Learning Stochastic Shortest Path with Linear Function Approximation

Authors: Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present some results from numerical simulations, which corroborate our theory. We construct an SSP instance based on the example used in the proof of the lower bound. ... The experimental results are shown in Fig. 1. In Fig. 1a, we plot the average regret RK/K versus K. It is evident that LEVIS has a sublinear regret, as opposed to the linear regret of the random policy. To further verify that the cumulative regret RK indeed grows at an e O(K) rate, in Fig. 1b, we make the log-log plot of RK/K and K. ... These results corroborate our theoretical ﬁndings.
Researcher Affiliation	Academia	1Department of Statistics and Data Science, Yale University, CT 06520, USA 2Department of Computer Science, University of California, Los Angeles, CA 90095, USA.
Pseudocode	Yes	Algorithm 1 LEVIS
Open Source Code	No	No explicit statement about providing access to source code or a link to a code repository.
Open Datasets	No	The paper uses a synthetic SSP instance constructed for numerical simulations, not a publicly available or open dataset. 'We construct an SSP instance based on the example used in the proof of the lower bound. Speciﬁcally, we have the action space A = { 1, 1}d 1 with \|A\| = 2d 1. The state space is S = {sinit, g}.'
Dataset Splits	No	The paper describes numerical simulations on a synthetic instance and does not involve typical machine learning dataset splits (training, validation, testing).
Hardware Specification	No	The paper mentions numerical simulations but does not provide any specific hardware details used for running the experiments.
Software Dependencies	No	No specific software dependencies with version numbers are provided.
Experiment Setup	Yes	We set d = 5 and B = 3 in the simulation. ... We set λ = 1, ρ = 0 and failing probability 0.01.