Optimism in Reinforcement Learning with Generalized Linear Function Approximation
Authors: Yining Wang, Ruosong Wang, Simon Shaolei Du, Akshay Krishnamurthy
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call optimistic closure, which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of e O H where H is the horizon, d is the dimensionality of the state-action features and T is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions. |
| Researcher Affiliation | Collaboration | Yining Wang University of Florida yining.wang@warrington.ufl.edu Ruosong Wang Carnegie Mellon University ruosongw@andrew.cmu.edu Simon S. Du University of Washington ssdu@cs.washington.edu Akshay Krishnamurthy Microsoft Research akshaykr@microsoft.com |
| Pseudocode | Yes | Algorithm 1 The LSVI-UCB algorithm with generalized linear function approximation. |
| Open Source Code | No | The paper does not provide any information about open-sourcing the code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not use or refer to any specific publicly available datasets for training. It discusses abstract MDPs and function approximation. |
| Dataset Splits | No | The paper is theoretical and does not describe any dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and defines mathematical parameters and constants for its analysis, but it does not provide concrete experimental setup details like hyperparameters (e.g., learning rate, batch size) or specific training configurations. |