reproducibilityindex.ai

GFlowNet Training by Policy Gradients

Authors: Puhua Niu, Shili Wu, Mingzhou Fan, Xiaoning Qian

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both simulated and real-world datasets verify that our policy-based strategies provide advanced RL perspectives for robust gradient estimation to improve GFlow Net performance.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA 2Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA 3Computational Science Intiative, Brookhaven National Laboratory, Upton, NY, USA.
Pseudocode	Yes	Algorithm 1 GFlow Net Training Workflow
Open Source Code	Yes	Our code is available at: github.com/niupuhua1234/GFN-PG.
Open Datasets	Yes	We use nucleotide string datasets, SIX6 and PH04, and molecular graph datasets, QM9 and s EH, from Shen et al. (2023).
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits with specific percentages or sample counts. The experiments involve learning generative policies in simulated environments or for specific datasets, where evaluation is done by comparing generated distributions to target distributions or ground truth, rather than through traditional data splits.
Hardware Specification	No	Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing.
Software Dependencies	No	Our implementation is built upon the torchgfn package (Lahlou et al., 2023).
Experiment Setup	Yes	For our policy-based methods, we set the value of hyper-parameter λ to 0.99 for the forward policy gradients based on the results of the ablation study. Trust region hyper-parameter ζF is set to 0.01 select from {0.01, 0.02, 0.03, 0.04, 0.05}. We use the Adam optimizer for model optimization. The learning rates of forward and backward policy are equal to 1 10-3, which is selected from {5 10-3, 1 10-3, 5 10-4, 1 10-4} by TB-U. The learning rates of value functions are set to 5 10-3, which is selected from {1 10-2, 5 10-3, 1 10-3} by RL-U. The learning rates of total flow estimator is 1 10-1, which is selected from {1 10-1, 5 10-2, 1 10-2, 5 10-3} by TB-U. The sample batch size is set to 128 for each optimization iteration. For all experiments, we report the performance with five different random seeds.