Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
Authors: Liyu Chen, Haipeng Luo
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We initiate the study of dynamic regret minimization for goal-oriented reinforcement learning modeled by a non-stationary stochastic shortest path problem with changing cost and transition functions. We start by establishing a lower bound Ω((B SAT ( c + B2 P ))1/3K2/3)...These algorithms combine the ideas of finite-horizon approximation [Chen et al., 2022a], special Bernstein-style bonuses of the MVP algorithm [Zhang et al., 2020], adaptive confidence widening [Wei and Luo, 2021], as well as some new techniques such as properly penalizing long-horizon policies. |
| Researcher Affiliation | Academia | Liyu Chen University of Southern California liyuc@usc.edu Haipeng Luo University of Southern California haipengl@usc.edu |
| Pseudocode | Yes | Algorithm 1 Finite-Horizon Approximation of SSP Algorithm 2 Non-Stationary MVP Algorithm 3 Non-Stationary MVP with a Doubling Trick Algorithm 4 MVP with Non-Stationarity Tests Algorithm 5 A Two-Phase Variant of Algorithm 1 Algorithm 6 MASTER-Base Algorithm (Phase 1) |
| Open Source Code | No | The paper does not provide concrete access to source code. It is a theoretical paper and does not mention code release, links to repositories, or code in supplementary materials. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and regret bounds. It does not mention using or training on any specific dataset. The 'Preliminaries' section defines the model but does not refer to empirical datasets. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental validation or data splits (training/test/validation). |
| Hardware Specification | No | The paper is theoretical and does not describe running experiments. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithmic design and analysis. It does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details about an experimental setup, such as hyperparameters, training configurations, or system-level settings. |