Scalable Bilinear Pi Learning Using State and Action Features

Authors: Yichen Chen, Lihong Li, Mengdi Wang

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space. We study the sample complexity of the π learning algorithm by analyzing the coupled primal-dual convergence process. We show that finding an ϵ-optimal policy (comparing to the best approximate policy) requires a sample size that is linear in DU /ϵ2 , ignoring logarithmic terms. The sample complexity depends only on the numbers of state and action features. It is invariant with respect to the actual sizes of the state and action spaces.
Researcher Affiliation Collaboration 1Department of Computer Science, Princeton University, Princeton, NJ, USA 2Google Inc., Kirkland, WA, USA 3Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA.
Pseudocode Yes Algorithm 1 Bilinear π Learning (Average Reward)
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets No The paper is theoretical and does not conduct experiments with specific datasets for training.
Dataset Splits No The paper is theoretical and does not conduct experiments, therefore no dataset split information for validation is provided.
Hardware Specification No The paper is theoretical and does not report on experimental results, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not report on experimental results, thus no specific software dependencies with version numbers are listed.
Experiment Setup No The paper is theoretical and does not report on experimental results, thus no specific experimental setup details like hyperparameters or training configurations are provided.