Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Probing Neural Combinatorial Optimization Models

Authors: Zhiqin Zhang, Yining Ma, Zhiguang Cao, Hoong Chuin Lau

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments and analysis reveal that NCO models encode low-level information essential for solution construction, while capturing high-level knowledge to facilitate better decisions. Using CS-Probing, we find that prevalent NCO models impose varying inductive biases on their learned representations, uncover direct evidence related to model generalization, and identify key embedding dimensions associated with specific knowledge.
Researcher Affiliation	Academia	Zhiqin Zhang Singapore Management University EMAIL Yining Ma Massachusetts Institute of Technology EMAIL Zhiguang Cao Singapore Management University EMAIL Hoong Chuin Lau Singapore Management University EMAIL
Pseudocode	No	The paper describes methods in prose and with architectural diagrams (e.g., Figure 1) and data flow illustrations (e.g., Figure 7), but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is publicly available 2. Source Code: https://github.com/123zhangzq/Neur IPS2025_probing/.
Open Datasets	Yes	We provide a Git Hub repository6 containing all codes required to construct the probing datasets. The repository includes: (1) instance generation with theoretical and greedy solutions... the codes provided in our link can directly generate the required datasets and facilitate the experiments presented in this paper.
Dataset Splits	No	Each probing dataset is split into training and test sets, with all reported results based on the test set, i.e., out-of-sample data.
Hardware Specification	Yes	In this study, we use NVIDIA A100-40G GPU with AMD EPYC Milan 7713 CPU.
Software Dependencies	No	The paper mentions using 'Gurobi [44] solver' and references the 'Gurobi Optimizer Reference Manual, 2024.', but does not provide a specific version number for the solver or any other software dependencies used in the experiments.
Experiment Setup	No	The paper describes the design of probing tasks, the nature of linear probing models, and evaluation metrics. However, it does not provide specific hyperparameters such as learning rates, batch sizes, number of epochs, or optimizer settings for training the linear probing models.