GFlowNet Training by Policy Gradients

Authors: Puhua Niu, Shili Wu, Mingzhou Fan, Xiaoning Qian

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both simulated and real-world datasets verify that our policy-based strategies provide advanced RL perspectives for robust gradient estimation to improve GFlow Net performance.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA 2Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA 3Computational Science Intiative, Brookhaven National Laboratory, Upton, NY, USA.
Pseudocode Yes Algorithm 1 GFlow Net Training Workflow
Open Source Code Yes Our code is available at: github.com/niupuhua1234/GFN-PG.
Open Datasets Yes We use nucleotide string datasets, SIX6 and PH04, and molecular graph datasets, QM9 and s EH, from Shen et al. (2023).
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits with specific percentages or sample counts. The experiments involve learning generative policies in simulated environments or for specific datasets, where evaluation is done by comparing generated distributions to target distributions or ground truth, rather than through traditional data splits.
Hardware Specification No Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing.
Software Dependencies No Our implementation is built upon the torchgfn package (Lahlou et al., 2023).
Experiment Setup Yes For our policy-based methods, we set the value of hyper-parameter λ to 0.99 for the forward policy gradients based on the results of the ablation study. Trust region hyper-parameter ζF is set to 0.01 select from {0.01, 0.02, 0.03, 0.04, 0.05}. We use the Adam optimizer for model optimization. The learning rates of forward and backward policy are equal to 1 10-3, which is selected from {5 10-3, 1 10-3, 5 10-4, 1 10-4} by TB-U. The learning rates of value functions are set to 5 10-3, which is selected from {1 10-2, 5 10-3, 1 10-3} by RL-U. The learning rates of total flow estimator is 1 10-1, which is selected from {1 10-1, 5 10-2, 1 10-2, 5 10-3} by TB-U. The sample batch size is set to 128 for each optimization iteration. For all experiments, we report the performance with five different random seeds.