Order-Preserving GFlowNets

Authors: Yihang Chen, Lukas Mauch

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically prove that the training process of OP-GFNs gradually sparsifies the learned reward landscape in single-objective maximization tasks. The sparsification concentrates on candidates of a higher hierarchy in the ordering, ensuring exploration at the beginning and exploitation towards the end of the training. We demonstrate OP-GFN s state-of-the-art performance in single-objective maximization (totally ordered) and multi-objective Pareto front approximation (partially ordered) tasks, including synthetic datasets, molecule generation, and neural architecture search.
Researcher Affiliation Collaboration Yihang Chen Section of Communication Systems EPFL, Switzerland yihang.chen@epfl.ch Lukas Mauch Sony Europe B.V. Stuttgart Laboratory 1, Germany lukas.mauch@sony.com
Pseudocode Yes The full pseudo algorithm is summarized in Algorithm 1.
Open Source Code Yes Our codes are available at https://github.com/yhangchen/OP-GFN.
Open Datasets Yes We empirically evaluate our method on synthesis environment Hyper Grid (Bengio et al., 2021a), and two real-world applications: NATS-Bench (Dong et al., 2021), and molecular designs (Shen et al., 2023; Jain et al., 2023) to demonstrate its advantages in the diversity and the top reward (or the closeness to the Pareto front) of the generated candidates.
Dataset Splits Yes We study the neural architecture search environment NATS-Bench (Dong et al., 2021), which includes three datasets: CIFAR10, CIFAR-100 and Image Net-16-120. ... Following Dong et al. (2021), when training the GFlow Net, we use the test accuracy at epoch 12 (u12( )) as the objective function in training; when evaluating the candidates, we use the test accuracy at epoch 200 (u200( )) as the objective function in testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning 'Estimated wall-clock time' in some contexts.
Software Dependencies No The paper mentions using 'torchgfn (Lahlou et al., 2023)' and 'Adam optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for these or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We use Adam optimizer with a learning rate of 0.1 for Zθ s parameters and a learning rate of 0.001 for the neural network s parameters. ... We use an exploration epsilon εF = 0.10. ... clip the gradient norm to a maximum of 10.0, and the policy logit to a maximum of absolute value of 50.0. ... We initialize log Zθ to be 5.0.