A Unifying Framework of Off-Policy General Value Function Evaluation

Authors: Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct empirical experiments to answer the following two questions: (a) can Gen TD evaluate both the forward and backward GVFs efficiently? (2) how does Gen TD compare with GTD in terms of the convergence speed and the quality of the estimation results? In our experiments, we Figure 1: Comparison between Gen TD and GTD for the tasks of evaluating w Qπ and w log µπ.
Researcher Affiliation Collaboration Tengyu Xu Meta Platforms, Inc Menlo Park, CA 94025 Zhuoran Yang Yale University New Haven, CT 06520 Zhaoran Wang Northwestern University Evanston, IL 60208 Yingbin Liang Ohio State University Columbus, OH 43210
Pseudocode Yes Algorithm 1 Generalized TD Learning (Gen TD)
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes We consider a variant of Baird s counterexample [1, 44] with 7 states and 2 actions (see Figure 2 in Appendix A).
Dataset Splits No The paper mentions using 'Baird's counterexample' in Section 5 but does not specify any explicit train/validation/test dataset splits or mention cross-validation. It focuses on evaluating estimation error against ground truth in a simulated environment.
Hardware Specification No Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]
Software Dependencies No The paper does not explicitly state any specific software or library versions used for the experiments.
Experiment Setup Yes The discount factor γ is set to be 0.99 in all tasks, and all curves in the plots are averaged over 20 independent runs. The detailed experimental setting is provided in Appendix A.