Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Scale Logits for Temperature-Conditional GFlowNets

Authors: Minsu Kim, Joohwan Ko, Taeyoung Yun, Dinghuai Zhang, Ling Pan, Woo Chang Kim, Jinkyoo Park, Emmanuel Bengio, Yoshua Bengio

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our code is available at https://github.com/dbsxodud-11/ logit-gfn. Our online learning with the Logit-GFN stands out with superior performance compared to GFN and alternative benchmarks, including well-established techniques in Reinforcement Learning (RL) (Schulman et al., 2017; Haarnoja et al., 2017) and Markov Chain Monte Carlo (MCMC) methods (Xie et al., 2020). We present experimental results on 4 biochemical tasks: QM9, s EH, TFBind8, and RNA-binding.
Researcher Affiliation	Collaboration	1Work performed while the author was at the Mila Qu ebec AI Institute 2Korea Advanced Institute of Science and Technology 3Mila Qu ebec AI Institute 4Universit e de Montr eal 5Hong Kong University of Science and Technology 6Recursion 7CIFAR.
Pseudocode	Yes	Algorithm 1 Scientific Discovery with Temperature-Conditional GFlow Nets
Open Source Code	Yes	Our code is available at https://github.com/dbsxodud-11/ logit-gfn
Open Datasets	Yes	QM9: In QM9 task, we build an offline dataset D using under 50th percentile data, which consists of 29,382 samples. TFBind8: In TFBind8 task, we follow the method suggested in Design-bench (Trabucco et al., 2022). We build an offline dataset D using under 50th percentile data, which consists of 32,898 samples. RNA-Binding: In the RNA-binding task, we follow the method suggested in Boot Gen (Kim et al., 2023). We prepare an offline dataset consisting of 5,000 randomly generated RNA sequences.
Dataset Splits	No	The paper mentions using an 'offline dataset' and querying with different β values, but does not explicitly state the training, validation, and test dataset splits (e.g., 80/10/10 percentages or specific sample counts for each split) or any cross-validation strategy.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer and refers to prior work for GFlow Net implementations but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For QM9 and s EH tasks, we employ a two-layer architecture with 1024 hidden units, while for the other tasks, we choose to use a two-layer architecture with 128 hidden units. ... we employ the Adam optimizer ... with the following learning rates: 1 10 2 for Zθ and 1 10 4 for both the forward and backward policy. ... Table 1 summarizes the reward exponent and normalization constants for different task settings. For each active round, we generate 32 samples for evaluating loss. ... We perform 1 gradient step per active round and use 32 samples from PRT to compute loss. For temperature-conditional GFlow Nets, we introduce a two-layer MLP with a 32-dimensional hidden layer and a Leaky Re LU activation function for embedding inverse temperature β. Table 2. Temperature Distributions of Temperature-conditioned GFlow Nets for various tasks.