Learning Stochastic Shortest Path with Linear Function Approximation
Authors: Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present some results from numerical simulations, which corroborate our theory. We construct an SSP instance based on the example used in the proof of the lower bound. ... The experimental results are shown in Fig. 1. In Fig. 1a, we plot the average regret RK/K versus K. It is evident that LEVIS has a sublinear regret, as opposed to the linear regret of the random policy. To further verify that the cumulative regret RK indeed grows at an e O(K) rate, in Fig. 1b, we make the log-log plot of RK/K and K. ... These results corroborate our theoretical findings. |
| Researcher Affiliation | Academia | 1Department of Statistics and Data Science, Yale University, CT 06520, USA 2Department of Computer Science, University of California, Los Angeles, CA 90095, USA. |
| Pseudocode | Yes | Algorithm 1 LEVIS |
| Open Source Code | No | No explicit statement about providing access to source code or a link to a code repository. |
| Open Datasets | No | The paper uses a synthetic SSP instance constructed for numerical simulations, not a publicly available or open dataset. 'We construct an SSP instance based on the example used in the proof of the lower bound. Specifically, we have the action space A = { 1, 1}d 1 with |A| = 2d 1. The state space is S = {sinit, g}.' |
| Dataset Splits | No | The paper describes numerical simulations on a synthetic instance and does not involve typical machine learning dataset splits (training, validation, testing). |
| Hardware Specification | No | The paper mentions numerical simulations but does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers are provided. |
| Experiment Setup | Yes | We set d = 5 and B = 3 in the simulation. ... We set λ = 1, ρ = 0 and failing probability 0.01. |