reproducibilityindex.ai

A Unifying Framework of Off-Policy General Value Function Evaluation

Authors: Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct empirical experiments to answer the following two questions: (a) can Gen TD evaluate both the forward and backward GVFs efficiently? (2) how does Gen TD compare with GTD in terms of the convergence speed and the quality of the estimation results? In our experiments, we Figure 1: Comparison between Gen TD and GTD for the tasks of evaluating w Qπ and w log µπ.
Researcher Affiliation	Collaboration	Tengyu Xu Meta Platforms, Inc Menlo Park, CA 94025 Zhuoran Yang Yale University New Haven, CT 06520 Zhaoran Wang Northwestern University Evanston, IL 60208 Yingbin Liang Ohio State University Columbus, OH 43210
Pseudocode	Yes	Algorithm 1 Generalized TD Learning (Gen TD)
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets	Yes	We consider a variant of Baird s counterexample [1, 44] with 7 states and 2 actions (see Figure 2 in Appendix A).
Dataset Splits	No	The paper mentions using 'Baird's counterexample' in Section 5 but does not specify any explicit train/validation/test dataset splits or mention cross-validation. It focuses on evaluating estimation error against ground truth in a simulated environment.
Hardware Specification	No	Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]
Software Dependencies	No	The paper does not explicitly state any specific software or library versions used for the experiments.
Experiment Setup	Yes	The discount factor γ is set to be 0.99 in all tasks, and all curves in the plots are averaged over 20 independent runs. The detailed experimental setting is provided in Appendix A.