Scalable Bilinear Pi Learning Using State and Action Features
Authors: Yichen Chen, Lihong Li, Mengdi Wang
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space. We study the sample complexity of the π learning algorithm by analyzing the coupled primal-dual convergence process. We show that finding an ϵ-optimal policy (comparing to the best approximate policy) requires a sample size that is linear in DU /ϵ2 , ignoring logarithmic terms. The sample complexity depends only on the numbers of state and action features. It is invariant with respect to the actual sizes of the state and action spaces. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, Princeton University, Princeton, NJ, USA 2Google Inc., Kirkland, WA, USA 3Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA. |
| Pseudocode | Yes | Algorithm 1 Bilinear π Learning (Average Reward) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments with specific datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments, therefore no dataset split information for validation is provided. |
| Hardware Specification | No | The paper is theoretical and does not report on experimental results, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not report on experimental results, thus no specific software dependencies with version numbers are listed. |
| Experiment Setup | No | The paper is theoretical and does not report on experimental results, thus no specific experimental setup details like hyperparameters or training configurations are provided. |