Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

Authors: Lowell Weissman, Michael Krumdick, A. Abbott

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analyze two fundamental complexity measures: solution space size and representation space size. Using the Traveling Salesman Problem (TSP) as a case study, we show that combinatorial optimization promotes smooth cost trends, and therefore meaningful scaling laws can be obtained even in the absence of an interpretable loss. We then show that suboptimality grows predictably for fixed-size models when scaling the number of TSP nodes or spatial dimensions, independent of whether the model was trained with reinforcement learning or supervised fine-tuning on a static dataset.
Researcher Affiliation	Collaboration	Lowell Weissman Virginia Tech Michael Krumdick Kensho Technologies A. Lynn Abbott Virginia Tech
Pseudocode	No	The paper describes algorithms like Proximal Policy Optimization (PPO) and supervised fine-tuning, but does not present them in a structured pseudocode or algorithm block format.
Open Source Code	Yes	Project code, along with our new dataset of 128 million optimal TSP solutions, is provided at: https://github.com/lowellw6/complexity-scaling-laws
Open Datasets	Yes	Project code, along with our new dataset of 128 million optimal TSP solutions, is provided at: https://github.com/lowellw6/complexity-scaling-laws ... We generate optimal solutions to 128 million TSP problems, 12.8 million for each node scale we study, and share this dataset for academic use.
Dataset Splits	No	Parameter and node scaling evaluations use the first 1.28 million problems of their corresponding Concorde-solved datasets. Compute scaling evaluations, performed throughout training every 4000 updates, use the first 12,800 problems. Spatial dimension scaling evaluations use their full approximate dataset for each scale. We sample one model tour for each optimal tour.
Hardware Specification	Yes	Each model was trained on an HPC node using one Nvidia Tesla V100 (16GB), 24 Intel Xeon Gold 6136 cores, and 384GB of memory.
Software Dependencies	Yes	We use BOHB [72] implemented with Optuna [73] to optimize update efficiency for 50-node TSP. ... We implemented BOHB using Optuna 3.5’s multivariate Tree-structured Parzen Estimator (TPE) [75]...
Experiment Setup	Yes	Here we provide the hyperparameter (HP) settings used for model training (Table 7), along with how we determined them through hyperparameter optimization (HPO). ...Table 7: Hyperparameters used for RL node-scaling experiments.