Learning Stochastic Shortest Path with Linear Function Approximation

Authors: Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present some results from numerical simulations, which corroborate our theory. We construct an SSP instance based on the example used in the proof of the lower bound. ... The experimental results are shown in Fig. 1. In Fig. 1a, we plot the average regret RK/K versus K. It is evident that LEVIS has a sublinear regret, as opposed to the linear regret of the random policy. To further verify that the cumulative regret RK indeed grows at an e O(K) rate, in Fig. 1b, we make the log-log plot of RK/K and K. ... These results corroborate our theoretical findings.
Researcher Affiliation Academia 1Department of Statistics and Data Science, Yale University, CT 06520, USA 2Department of Computer Science, University of California, Los Angeles, CA 90095, USA.
Pseudocode Yes Algorithm 1 LEVIS
Open Source Code No No explicit statement about providing access to source code or a link to a code repository.
Open Datasets No The paper uses a synthetic SSP instance constructed for numerical simulations, not a publicly available or open dataset. 'We construct an SSP instance based on the example used in the proof of the lower bound. Specifically, we have the action space A = { 1, 1}d 1 with |A| = 2d 1. The state space is S = {sinit, g}.'
Dataset Splits No The paper describes numerical simulations on a synthetic instance and does not involve typical machine learning dataset splits (training, validation, testing).
Hardware Specification No The paper mentions numerical simulations but does not provide any specific hardware details used for running the experiments.
Software Dependencies No No specific software dependencies with version numbers are provided.
Experiment Setup Yes We set d = 5 and B = 3 in the simulation. ... We set λ = 1, ρ = 0 and failing probability 0.01.