Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ProfiX: Improving Profile-Guided Optimization in Compilers with Graph Neural Networks

Authors: Huiri Tan, Juyong Jiang, Jiasi Shen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the SPEC 2017 benchmarks demonstrate that PROFIX achieves up to a 9.15% performance improvement compared to the state-of-the-art traditional algorithm and an average 6.26% improvement over the baseline machine learning models. These results highlight the effectiveness of PROFIX in optimizing real-world application profiles.
Researcher Affiliation Academia 1 The Hong Kong University of Science and Technology 2 The Hong Kong University of Science and Technology (Guangzhou) EMAIL, EMAIL
Pseudocode No The paper describes the model architecture and mathematical formulas (Eq 1-11) and provides an overview of the model structure in Figure 2, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code or an algorithm.
Open Source Code No Answer: [No] Justification: We will share the code upon request.
Open Datasets Yes We evaluate PROFIX with a diverse dataset covering compiler toolchains (Clang [29], GCC [50]), database systems (My SQL [11], SQLite [13]), and performance benchmarks (SPEC CPU 20172). ... 2https://www.spec.org/cpu2017/
Dataset Splits Yes The processed data is split into training, validation, and test sets with a ratio of 80%/10%/10%.
Hardware Specification Yes We conduct all training and testing experiments on a server with 2 Intel(R) Xeon(R) Gold 6444Y CPU (16 Cores), 256 GB RAM, and 2 RTX 5880 GPU (48 GB Memory).
Software Dependencies No We use the Py Torch framework when implementing our model and baselines, cited in Section 1 and Section 4. (Does not specify version number).
Experiment Setup Yes Table 9: Key hyperparameters for model training. Learning Rate 0.001 Train Batch Size 128 Validate/Test Batch Size 1 Optimizer Adam Weight Decay 0 (No weight decay) Learning Rate Scheduler Step LR (Step Size: 5, Gamma: 0.97) Epochs 300 LSTM Hidden Size 256 SAGE Attention Layers 3 Dropout Rate 0.1 Early Stopping Patience 10 Loss Function RMSE Loss Train-Validate-Test Split Ratio 80%, 10%, 10%