Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ProfiX: Improving Profile-Guided Optimization in Compilers with Graph Neural Networks

Authors: Huiri Tan, Juyong Jiang, Jiasi Shen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the SPEC 2017 benchmarks demonstrate that PROFIX achieves up to a 9.15% performance improvement compared to the state-of-the-art traditional algorithm and an average 6.26% improvement over the baseline machine learning models. These results highlight the effectiveness of PROFIX in optimizing real-world application profiles.
Researcher Affiliation	Academia	1 The Hong Kong University of Science and Technology 2 The Hong Kong University of Science and Technology (Guangzhou) EMAIL, EMAIL
Pseudocode	No	The paper describes the model architecture and mathematical formulas (Eq 1-11) and provides an overview of the model structure in Figure 2, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code or an algorithm.
Open Source Code	No	Answer: [No] Justification: We will share the code upon request.
Open Datasets	Yes	We evaluate PROFIX with a diverse dataset covering compiler toolchains (Clang [29], GCC [50]), database systems (My SQL [11], SQLite [13]), and performance benchmarks (SPEC CPU 20172). ... 2https://www.spec.org/cpu2017/
Dataset Splits	Yes	The processed data is split into training, validation, and test sets with a ratio of 80%/10%/10%.
Hardware Specification	Yes	We conduct all training and testing experiments on a server with 2 Intel(R) Xeon(R) Gold 6444Y CPU (16 Cores), 256 GB RAM, and 2 RTX 5880 GPU (48 GB Memory).
Software Dependencies	No	We use the Py Torch framework when implementing our model and baselines, cited in Section 1 and Section 4. (Does not specify version number).
Experiment Setup	Yes	Table 9: Key hyperparameters for model training. Learning Rate 0.001 Train Batch Size 128 Validate/Test Batch Size 1 Optimizer Adam Weight Decay 0 (No weight decay) Learning Rate Scheduler Step LR (Step Size: 5, Gamma: 0.97) Epochs 300 LSTM Hidden Size 256 SAGE Attention Layers 3 Dropout Rate 0.1 Early Stopping Patience 10 Loss Function RMSE Loss Train-Validate-Test Split Ratio 80%, 10%, 10%