Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Distributional GFlowNets with Quantile Flows
Authors: Dinghuai Zhang, Ling Pan, Ricky T. Q. Chen, Aaron Courville, Yoshua Bengio
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Benchmarking Experiments: The proposed method has been demonstrated to be able to capture the uncertainty in stochastic environments. On the other hand, in this section we evaluate its performance on deterministic structured generation benchmarks. [...] Figure 4: Experiment results on the hypergrid tasks for different scale levels. [...] Figure 5: The number of modes reached by each algorithm across the whole training process for the sequence generation task. [...] Figure 6: Molecule synthesis experiment. (b) The number of modes captured by algorithms. (c) Tanimoto similarity (lower is better). (d) Average reward across the top-100 molecules. |
| Researcher Affiliation | Collaboration | Dinghuai Zhang Mila, University of Montreal Hong Kong University of Science and Technology Ricky T. Q. Chen Meta AI, Fundamental AI Research Aaron Courville Mila, University of Montreal Yoshua Bengio Mila, University of Montreal |
| Pseudocode | Yes | Algorithm 1 GFlow Net quantile matching (QM) algorithm |
| Open Source Code | Yes | Our code is openly available at https://github.com/zdh Narsil/Distributional-GFlow Nets. |
| Open Datasets | Yes | We investigate the hypergrid task from Bengio et al. (2021a). |
| Dataset Splits | No | The paper mentions several tasks (Hypergrid, Sequence generation, Molecule optimization) and describes evaluation metrics for them (e.g., L1 error, number of modes discovered, Tanimoto similarity, average reward). However, it does not specify explicit training/validation/test dataset splits, percentages, or predefined splits with citations for reproducibility. It describes the problem setups but not how the data is partitioned for evaluation. |
| Hardware Specification | Yes | In this work, all experimental results are run on NVIDIA Tesla V100 Volta GPUs, and are averaged across 4 random seeds. |
| Software Dependencies | No | The paper mentions using PyTorch (Paszke et al., 2019) and Adam optimizer, along with architectural details like MLPs and transformers. However, it does not provide specific version numbers for any of these software components or libraries, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | Regarding hyperparameters, we do not do much sweeping: QM uses the same learning rate as FM which is 1 10 4; what s more, QM uses N = N = 8 and 256 dimensional Fourier feature. Other baselines like TB, MCMC, PPO use the same configuration as in Malkin et al. (2022a). [...] We use a transformer with 3 hidden layers and 8 attention heads. [...] All methods are optimized with Adam optimizer for 50000 training steps, with the minibatch size being 16. We use a fixed random action probability of 0.005. [...] For quantile matching we use a two-layer MLP to process the Fourier feature of β, and then compute its element-wise product with the transformer encoding feature;about hyperparameters, we use the same learning rate (5 10 4) as FM, N = N = 16, and 256 dimensional Fourier feature. |