Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Authors: Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus to induce an optimistic SSP problem whose associated value iteration scheme is guaranteed to converge. We prove that EB-SSP achieves the minimax regret rate...
Researcher Affiliation Collaboration Jean Tarbouriech Facebook AI Research & Inria Lille jean.tarbouriech@gmail.com Runlong Zhou Tsinghua University zhourunlongvector@gmail.com Simon S. Du University of Washington & Facebook AI Research ssdu@cs.washington.edu Matteo Pirotta Facebook AI Research Paris pirotta@fb.com Michal Valko Deep Mind Paris valkom@deepmind.com Alessandro Lazaric Facebook AI Research Paris lazaric@fb.com
Pseudocode Yes Algorithm 1: Algorithm EB-SSP
Open Source Code No The paper does not contain any statements about releasing open-source code or provide a link to a code repository for the methodology described.
Open Datasets No The paper is theoretical and focuses on algorithm design and proofs; it does not describe experiments involving datasets for training. Therefore, no information on publicly available datasets for training is provided.
Dataset Splits No The paper is theoretical and focuses on algorithm design and proofs; it does not describe empirical experiments that would involve training, validation, and test dataset splits.
Hardware Specification No The paper is theoretical, focusing on algorithm design and analysis, and does not describe any empirical experiments that would require specific hardware for execution. Therefore, no hardware specifications are provided.
Software Dependencies No The paper is theoretical and focuses on algorithm design and proofs; it does not describe empirical experiments that would require specific software dependencies with version numbers for reproducibility.
Experiment Setup No The paper is theoretical and focuses on algorithm design and analysis, rather than describing an empirical experiment with specific setup details like hyperparameters or training configurations.