Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distributional GFlowNets with Quantile Flows

Authors: Dinghuai Zhang, Ling Pan, Ricky T. Q. Chen, Aaron Courville, Yoshua Bengio

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Benchmarking Experiments: The proposed method has been demonstrated to be able to capture the uncertainty in stochastic environments. On the other hand, in this section we evaluate its performance on deterministic structured generation benchmarks. [...] Figure 4: Experiment results on the hypergrid tasks for diﬀerent scale levels. [...] Figure 5: The number of modes reached by each algorithm across the whole training process for the sequence generation task. [...] Figure 6: Molecule synthesis experiment. (b) The number of modes captured by algorithms. (c) Tanimoto similarity (lower is better). (d) Average reward across the top-100 molecules.
Researcher Affiliation	Collaboration	Dinghuai Zhang Mila, University of Montreal Hong Kong University of Science and Technology Ricky T. Q. Chen Meta AI, Fundamental AI Research Aaron Courville Mila, University of Montreal Yoshua Bengio Mila, University of Montreal
Pseudocode	Yes	Algorithm 1 GFlow Net quantile matching (QM) algorithm
Open Source Code	Yes	Our code is openly available at https://github.com/zdh Narsil/Distributional-GFlow Nets.
Open Datasets	Yes	We investigate the hypergrid task from Bengio et al. (2021a).
Dataset Splits	No	The paper mentions several tasks (Hypergrid, Sequence generation, Molecule optimization) and describes evaluation metrics for them (e.g., L1 error, number of modes discovered, Tanimoto similarity, average reward). However, it does not specify explicit training/validation/test dataset splits, percentages, or predefined splits with citations for reproducibility. It describes the problem setups but not how the data is partitioned for evaluation.
Hardware Specification	Yes	In this work, all experimental results are run on NVIDIA Tesla V100 Volta GPUs, and are averaged across 4 random seeds.
Software Dependencies	No	The paper mentions using PyTorch (Paszke et al., 2019) and Adam optimizer, along with architectural details like MLPs and transformers. However, it does not provide specific version numbers for any of these software components or libraries, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	Regarding hyperparameters, we do not do much sweeping: QM uses the same learning rate as FM which is 1 10 4; what s more, QM uses N = N = 8 and 256 dimensional Fourier feature. Other baselines like TB, MCMC, PPO use the same conﬁguration as in Malkin et al. (2022a). [...] We use a transformer with 3 hidden layers and 8 attention heads. [...] All methods are optimized with Adam optimizer for 50000 training steps, with the minibatch size being 16. We use a ﬁxed random action probability of 0.005. [...] For quantile matching we use a two-layer MLP to process the Fourier feature of β, and then compute its element-wise product with the transformer encoding feature;about hyperparameters, we use the same learning rate (5 10 4) as FM, N = N = 16, and 256 dimensional Fourier feature.