Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

Authors: Liyu Chen, Haipeng Luo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We initiate the study of dynamic regret minimization for goal-oriented reinforcement learning modeled by a non-stationary stochastic shortest path problem with changing cost and transition functions. We start by establishing a lower bound Ω((B SAT ( c + B2 P ))1/3K2/3)...These algorithms combine the ideas of finite-horizon approximation [Chen et al., 2022a], special Bernstein-style bonuses of the MVP algorithm [Zhang et al., 2020], adaptive confidence widening [Wei and Luo, 2021], as well as some new techniques such as properly penalizing long-horizon policies.
Researcher Affiliation Academia Liyu Chen University of Southern California liyuc@usc.edu Haipeng Luo University of Southern California haipengl@usc.edu
Pseudocode Yes Algorithm 1 Finite-Horizon Approximation of SSP Algorithm 2 Non-Stationary MVP Algorithm 3 Non-Stationary MVP with a Doubling Trick Algorithm 4 MVP with Non-Stationarity Tests Algorithm 5 A Two-Phase Variant of Algorithm 1 Algorithm 6 MASTER-Base Algorithm (Phase 1)
Open Source Code No The paper does not provide concrete access to source code. It is a theoretical paper and does not mention code release, links to repositories, or code in supplementary materials.
Open Datasets No The paper is theoretical and focuses on algorithm design and regret bounds. It does not mention using or training on any specific dataset. The 'Preliminaries' section defines the model but does not refer to empirical datasets.
Dataset Splits No The paper is theoretical and does not describe experimental validation or data splits (training/test/validation).
Hardware Specification No The paper is theoretical and does not describe running experiments. Therefore, no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and focuses on algorithmic design and analysis. It does not mention specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not include details about an experimental setup, such as hyperparameters, training configurations, or system-level settings.