reproducibilityindex.ai

Pessimistic Backward Policy for GFlowNets

Authors: Hyosoon Jang, Yunhui Jang, Minsu Kim, Jinkyoo Park, Sungsoo Ahn

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate PBP-GFN across eight benchmarks, including hyper-grid environment, bag generation, structured set generation, molecular generation, and four RNA sequence generation tasks. In these experiments, we observe that PBP-GFN (1) improves the learning of target Boltzmann distribution and (2) enhances the discovery of high-reward objects, while (3) maintaining the diversity of the sampled high-reward objects.
Researcher Affiliation	Academia	Hyosoon Jang1, Yunhui Jang1, Minsu Kim2, Jinkyoo Park2, Sungsoo Ahn1 1POSTECH 2KAIST
Pseudocode	Yes	Algorithm 1 Learning pessimistic backward policy for GFlow Nets
Open Source Code	Yes	2Code: https://github.com/hsjang0/Pessimistic-Backward-Policy-for-GFlow Nets.
Open Datasets	Yes	We extensively validate PBP-GFN on various benchmarks: hyper-grid benchmark [1], bag generation [13], maximum independent set problem [5], fragment-based molecule generation [1], and four RNA sequence generation tasks [4].
Dataset Splits	Yes	We also consider solving maximum independent set problems, where the action is selecting a node and the reward is the size of the independent set. At each epoch, the GFlow Nets train with the set of training graphs, and sample 20 solutions for each validation graph and measure the average reward and the maximum reward following Zhang et al. [5].
Hardware Specification	Yes	We use a single GPU of NVIDIA Ge Force RTX 3090.
Software Dependencies	No	The paper mentions implementing models with neural networks (e.g., 'feed-forward neural network', 'graph isomorphism neural network') but does not specify software dependencies like libraries (e.g., PyTorch, TensorFlow) with version numbers.
Experiment Setup	Yes	The detailed experimental settings are described in Appendix B. For instance, for the Hyper-grid environment: 'The forward policy is implemented with the feed-forward neural network that consists of two layers with 256 hidden dimensions and is trained with a learning rate of 1e 3. The learning rate for Zθ is 0.1.'