Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Authors: Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to induce an optimistic SSP problem whose associated value iteration scheme is guaranteed to converge. We prove that EB-SSP achieves the minimax regret rate... |
| Researcher Affiliation | Collaboration | Jean Tarbouriech Facebook AI Research & Inria Lille jean.tarbouriech@gmail.com Runlong Zhou Tsinghua University zhourunlongvector@gmail.com Simon S. Du University of Washington & Facebook AI Research ssdu@cs.washington.edu Matteo Pirotta Facebook AI Research Paris pirotta@fb.com Michal Valko Deep Mind Paris valkom@deepmind.com Alessandro Lazaric Facebook AI Research Paris lazaric@fb.com |
| Pseudocode | Yes | Algorithm 1: Algorithm EB-SSP |
| Open Source Code | No | The paper does not contain any statements about releasing open-source code or provide a link to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and proofs; it does not describe experiments involving datasets for training. Therefore, no information on publicly available datasets for training is provided. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and proofs; it does not describe empirical experiments that would involve training, validation, and test dataset splits. |
| Hardware Specification | No | The paper is theoretical, focusing on algorithm design and analysis, and does not describe any empirical experiments that would require specific hardware for execution. Therefore, no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and proofs; it does not describe empirical experiments that would require specific software dependencies with version numbers for reproducibility. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and analysis, rather than describing an empirical experiment with specific setup details like hyperparameters or training configurations. |