Dynamic Regret of Policy Optimization in Non-Stationary Environments
Authors: Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our contributions. The contributions of our work can be summarized as follows: We propose two model-free policy optimization algorithms, POWER and POWER++, for non-stationary RL with adversarial rewards; We provide dynamic regret analysis for both algorithms, and the regret bounds are applicable across all regimes of non-stationarity of the underlying model; When the environment is nearly stationary, our dynamic regret bounds are of order O(T 1/2) and match the near-optimal static regret bounds, thereby demonstrating the adaptive nearoptimality of our algorithms in slow-changing environments. |
| Researcher Affiliation | Academia | Yingjie Fei1 Zhuoran Yang2 Zhaoran Wang1 Qiaomin Xie3 1 Northwestern University; yf275@cornell.edu, zhaoranwang@gmail.com 2 Princeton University; zy6@princeton.edu 3 Cornell University; qiaomin.xie@cornell.edu |
| Pseudocode | Yes | Algorithm 1 POWER; Algorithm 2 POWER++ |
| Open Source Code | No | The paper does not include any statement about providing open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not mention using any specific datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not specify any dataset splits (training, validation, test) as it does not conduct empirical experiments. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers needed for reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not provide specific experimental setup details such as hyperparameters or training configurations for empirical evaluation. |