reproducibilityindex.ai

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

Authors: Liyu Chen, Haipeng Luo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We initiate the study of dynamic regret minimization for goal-oriented reinforcement learning modeled by a non-stationary stochastic shortest path problem with changing cost and transition functions. We start by establishing a lower bound Ω((B SAT ( c + B2 P ))1/3K2/3)...These algorithms combine the ideas of ﬁnite-horizon approximation [Chen et al., 2022a], special Bernstein-style bonuses of the MVP algorithm [Zhang et al., 2020], adaptive conﬁdence widening [Wei and Luo, 2021], as well as some new techniques such as properly penalizing long-horizon policies.
Researcher Affiliation	Academia	Liyu Chen University of Southern California liyuc@usc.edu Haipeng Luo University of Southern California haipengl@usc.edu
Pseudocode	Yes	Algorithm 1 Finite-Horizon Approximation of SSP Algorithm 2 Non-Stationary MVP Algorithm 3 Non-Stationary MVP with a Doubling Trick Algorithm 4 MVP with Non-Stationarity Tests Algorithm 5 A Two-Phase Variant of Algorithm 1 Algorithm 6 MASTER-Base Algorithm (Phase 1)
Open Source Code	No	The paper does not provide concrete access to source code. It is a theoretical paper and does not mention code release, links to repositories, or code in supplementary materials.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and regret bounds. It does not mention using or training on any specific dataset. The 'Preliminaries' section defines the model but does not refer to empirical datasets.
Dataset Splits	No	The paper is theoretical and does not describe experimental validation or data splits (training/test/validation).
Hardware Specification	No	The paper is theoretical and does not describe running experiments. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and focuses on algorithmic design and analysis. It does not mention specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not include details about an experimental setup, such as hyperparameters, training configurations, or system-level settings.