GFlowNet Training by Policy Gradients
Authors: Puhua Niu, Shili Wu, Mingzhou Fan, Xiaoning Qian
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both simulated and real-world datasets verify that our policy-based strategies provide advanced RL perspectives for robust gradient estimation to improve GFlow Net performance. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA 2Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA 3Computational Science Intiative, Brookhaven National Laboratory, Upton, NY, USA. |
| Pseudocode | Yes | Algorithm 1 GFlow Net Training Workflow |
| Open Source Code | Yes | Our code is available at: github.com/niupuhua1234/GFN-PG. |
| Open Datasets | Yes | We use nucleotide string datasets, SIX6 and PH04, and molecular graph datasets, QM9 and s EH, from Shen et al. (2023). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with specific percentages or sample counts. The experiments involve learning generative policies in simulated environments or for specific datasets, where evaluation is done by comparing generated distributions to target distributions or ground truth, rather than through traditional data splits. |
| Hardware Specification | No | Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing. |
| Software Dependencies | No | Our implementation is built upon the torchgfn package (Lahlou et al., 2023). |
| Experiment Setup | Yes | For our policy-based methods, we set the value of hyper-parameter λ to 0.99 for the forward policy gradients based on the results of the ablation study. Trust region hyper-parameter ζF is set to 0.01 select from {0.01, 0.02, 0.03, 0.04, 0.05}. We use the Adam optimizer for model optimization. The learning rates of forward and backward policy are equal to 1 10-3, which is selected from {5 10-3, 1 10-3, 5 10-4, 1 10-4} by TB-U. The learning rates of value functions are set to 5 10-3, which is selected from {1 10-2, 5 10-3, 1 10-3} by RL-U. The learning rates of total flow estimator is 1 10-1, which is selected from {1 10-1, 5 10-2, 1 10-2, 5 10-3} by TB-U. The sample batch size is set to 128 for each optimization iteration. For all experiments, we report the performance with five different random seeds. |