reproducibilityindex.ai

Scalable Bilinear Pi Learning Using State and Action Features

Authors: Yichen Chen, Lihong Li, Mengdi Wang

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We prove that it is sample-efﬁcient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space. We study the sample complexity of the π learning algorithm by analyzing the coupled primal-dual convergence process. We show that ﬁnding an ϵ-optimal policy (comparing to the best approximate policy) requires a sample size that is linear in DU /ϵ2 , ignoring logarithmic terms. The sample complexity depends only on the numbers of state and action features. It is invariant with respect to the actual sizes of the state and action spaces.
Researcher Affiliation	Collaboration	1Department of Computer Science, Princeton University, Princeton, NJ, USA 2Google Inc., Kirkland, WA, USA 3Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ, USA.
Pseudocode	Yes	Algorithm 1 Bilinear π Learning (Average Reward)
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	No	The paper is theoretical and does not conduct experiments with specific datasets for training.
Dataset Splits	No	The paper is theoretical and does not conduct experiments, therefore no dataset split information for validation is provided.
Hardware Specification	No	The paper is theoretical and does not report on experimental results, thus no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not report on experimental results, thus no specific software dependencies with version numbers are listed.
Experiment Setup	No	The paper is theoretical and does not report on experimental results, thus no specific experimental setup details like hyperparameters or training configurations are provided.