Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Flow from Generative Pretext Tasks for Neural Architecture Encoding

Authors: Sunwoo Kim, Hyunjin Hwang, Kijung Shin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that FGP boosts encoder performance by up to 106% in Precision@1%, compared to the same encoder trained solely with supervised learning. We demonstrate the effectiveness of our pre-training method compared to baseline pre-training methods across multiple downstream tasks, including performance prediction and neural architecture search.
Researcher Affiliation	Academia	1Kim Jaechul Graduate School of AI, 2School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) {kswoo97, julia510, kijungs} @ kaist.ac.kr
Pseudocode	No	The paper describes the proposed method and steps for obtaining the flow surrogate in Section 3 and its subsections, along with a visual representation in Figure 3. However, there are no explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and datasets are available at https://github.com/kswoo97/FGPAnom.
Open Datasets	Yes	We leverage three computer vision neural architecture datasets, which are NAS-Bench-101 (NB-101) [51], NAS-Bench-201 (NB-201) [8], and NAS-Bench-301 (NB-301) [39] datasets. Our code and datasets are available at https://github.com/kswoo97/FGPAnom.
Dataset Splits	Yes	For NB-101 and NB-201 datasets, we follow the training and test splits provided in [34, 15]. For the NB-301 dataset, since the baseline method ZC-Proxy [56] requires certain numerical properties of architectures, we use a subset of the original NB-301 dataset where these properties are available. We sample 40 architectures from the test set to create a validation set, following the approach in [15].
Hardware Specification	Yes	We conducted our experiments on a machine with NVIDIA RTX 8000 D6 GPUs (48GB memory) and two Intel Xeon Silver 4214R processors.
Software Dependencies	Yes	FGP is primarily implemented using the Pytorch (v1.12.1) and Pytorch Geometric (v2.2.0) libraries.
Experiment Setup	Yes	We use Adam W [30] as the learning optimizer. We set the batch size and pre-training epochs to 256 and 200, respectively. Appendix A.4 details hyperparameter tuning, including Learning rate within {10^-3, 5*10^-4, 10^-4}, Projection head dimension within {32, 64, 128, 256}, Projection head number of layers within {1, 2, 3}, Weight decay within {10^-6, 0.0}, and for FGP, (λ1, λ2) within {(1/2, 1/2), (1/3, 2/3), (2/3, 1/3)}.