Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Estimation of Policy Gradient via Double Fitted Iteration

Authors: Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we evaluate the performance of FPG on both policy gradient estimation and policy optimization, using either softmax tabular or Re LU policy networks. Under various metrics, our results show that FPG significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ, USA 2School of Mathematical Science, Peking University, Beijing, China.
Pseudocode	Yes	Algorithm 1 Fitted PG Algorithm
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper mentions using 'Open AI gym Frozen Lake and Cliff Walking environment' to generate datasets, but does not provide specific access information (URL, DOI, repository, or formal citation for a pre-existing public dataset) for the datasets generated or used.
Dataset Splits	No	The paper discusses the use of 'off-policy data' and 'offline logged data' but does not specify clear train/validation/test dataset splits, percentages, or methodology for partitioning data.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or cloud computing instance types.
Software Dependencies	No	The paper mentions software components like 'Open AI gym' and 'softmax tabular or Re LU policy networks' but does not specify version numbers for any libraries, frameworks, or other software dependencies.
Experiment Setup	No	The paper describes some aspects of the experimental setup, such as policy parameterization and environment modifications (e.g., 'adding artificial randomness for stochastic transitions... with probability 0.1'), but does not provide specific hyperparameters like learning rate, batch size, number of epochs, or optimizer settings.