Order-Preserving GFlowNets
Authors: Yihang Chen, Lukas Mauch
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically prove that the training process of OP-GFNs gradually sparsifies the learned reward landscape in single-objective maximization tasks. The sparsification concentrates on candidates of a higher hierarchy in the ordering, ensuring exploration at the beginning and exploitation towards the end of the training. We demonstrate OP-GFN s state-of-the-art performance in single-objective maximization (totally ordered) and multi-objective Pareto front approximation (partially ordered) tasks, including synthetic datasets, molecule generation, and neural architecture search. |
| Researcher Affiliation | Collaboration | Yihang Chen Section of Communication Systems EPFL, Switzerland yihang.chen@epfl.ch Lukas Mauch Sony Europe B.V. Stuttgart Laboratory 1, Germany lukas.mauch@sony.com |
| Pseudocode | Yes | The full pseudo algorithm is summarized in Algorithm 1. |
| Open Source Code | Yes | Our codes are available at https://github.com/yhangchen/OP-GFN. |
| Open Datasets | Yes | We empirically evaluate our method on synthesis environment Hyper Grid (Bengio et al., 2021a), and two real-world applications: NATS-Bench (Dong et al., 2021), and molecular designs (Shen et al., 2023; Jain et al., 2023) to demonstrate its advantages in the diversity and the top reward (or the closeness to the Pareto front) of the generated candidates. |
| Dataset Splits | Yes | We study the neural architecture search environment NATS-Bench (Dong et al., 2021), which includes three datasets: CIFAR10, CIFAR-100 and Image Net-16-120. ... Following Dong et al. (2021), when training the GFlow Net, we use the test accuracy at epoch 12 (u12( )) as the objective function in training; when evaluating the candidates, we use the test accuracy at epoch 200 (u200( )) as the objective function in testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning 'Estimated wall-clock time' in some contexts. |
| Software Dependencies | No | The paper mentions using 'torchgfn (Lahlou et al., 2023)' and 'Adam optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for these or any other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We use Adam optimizer with a learning rate of 0.1 for Zθ s parameters and a learning rate of 0.001 for the neural network s parameters. ... We use an exploration epsilon εF = 0.10. ... clip the gradient norm to a maximum of 10.0, and the policy logit to a maximum of absolute value of 50.0. ... We initialize log Zθ to be 5.0. |