Pessimistic Backward Policy for GFlowNets

Authors: Hyosoon Jang, Yunhui Jang, Minsu Kim, Jinkyoo Park, Sungsoo Ahn

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate PBP-GFN across eight benchmarks, including hyper-grid environment, bag generation, structured set generation, molecular generation, and four RNA sequence generation tasks. In these experiments, we observe that PBP-GFN (1) improves the learning of target Boltzmann distribution and (2) enhances the discovery of high-reward objects, while (3) maintaining the diversity of the sampled high-reward objects.
Researcher Affiliation Academia Hyosoon Jang1, Yunhui Jang1, Minsu Kim2, Jinkyoo Park2, Sungsoo Ahn1 1POSTECH 2KAIST
Pseudocode Yes Algorithm 1 Learning pessimistic backward policy for GFlow Nets
Open Source Code Yes 2Code: https://github.com/hsjang0/Pessimistic-Backward-Policy-for-GFlow Nets.
Open Datasets Yes We extensively validate PBP-GFN on various benchmarks: hyper-grid benchmark [1], bag generation [13], maximum independent set problem [5], fragment-based molecule generation [1], and four RNA sequence generation tasks [4].
Dataset Splits Yes We also consider solving maximum independent set problems, where the action is selecting a node and the reward is the size of the independent set. At each epoch, the GFlow Nets train with the set of training graphs, and sample 20 solutions for each validation graph and measure the average reward and the maximum reward following Zhang et al. [5].
Hardware Specification Yes We use a single GPU of NVIDIA Ge Force RTX 3090.
Software Dependencies No The paper mentions implementing models with neural networks (e.g., 'feed-forward neural network', 'graph isomorphism neural network') but does not specify software dependencies like libraries (e.g., PyTorch, TensorFlow) with version numbers.
Experiment Setup Yes The detailed experimental settings are described in Appendix B. For instance, for the Hyper-grid environment: 'The forward policy is implemented with the feed-forward neural network that consists of two layers with 256 hidden dimensions and is trained with a learning rate of 1e 3. The learning rate for Zθ is 0.1.'