Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement

Authors: Jonathan Pirnay, Dominik G. Grimm

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. The models trained with our method achieve comparable performance and generalization to those trained with expert data. Additionally, we apply our method to the Job Shop Scheduling Problem using a transformer-based architecture and outperform existing state-of-the-art methods by a wide margin.
Researcher Affiliation	Academia	Jonathan Pirnay EMAIL Technical University of Munich, TUM Campus Straubing University of Applied Sciences Weihenstephan-Triesdorf Dominik G. Grimm dominik.grimm@{hswt,tum}.de Technical University of Munich, TUM Campus Straubing University of Applied Sciences Weihenstephan-Triesdorf
Pseudocode	Yes	Algorithm 1: Self-improvement training for neural CO Input: X: distribution over problem instances; f : objective function Input: n: number of instances to sample in each epoch Input: m: number of sequences to sample for each instance Input: Validation X: validation dataset 1 Randomly initialize policy πθ 2 πbest πθ current best policy 4 foreach epoch do 5 Sample set of n problem instances Instances X 6 foreach x Instances do 7 Sample set of m feasible solutions A := {a(1) 1:T , . . . , a(m) 1:T } πbest 8 Dataset Dataset { x, arg maxa1:T A fx(a1:T ) } add best solution 9 foreach batch do 10 Sample B instances and partial solutions a(j) 1:dj = (a(j) 1 , . . . , a(j) dj ) with dj < T from Dataset for j {1, . . . , B} 11 Minimize batch-wise loss Lθ = 1 B PB j=1 log πθ a(j) dj+1\|a(j) 1:dj next-token prediction 12 if greedy performance of πθ on Validation better than πbest then 13 πbest πθ update best policy 14 Dataset reset dataset
Open Source Code	Yes	Our code for the experiments, data, and trained network weights are available at https://github.com/ grimmlab/gumbeldore.
Open Datasets	Yes	Datasets and supervised training Training (for supervised learning), validation, and test data are generated in the standard way of previous work (Kool et al., 2019b) by uniformly sampling N points from the two-dimensional unit square. The training dataset consists of one million randomly generated instances with N = 100 nodes for TSP. The validation and test sets (also N = 100) include 10,000 instances each. The test set is the same widely used dataset generated by Kool et al. (2019b). For N {200, 500, 1000}, we use a test set of 128 instances identical to Drakulic et al. (2023). For supervised training and to compute optimality gaps, we obtain optimal solutions for the generated instances with the Concorde solver (Applegate et al., 2006). For the CVRP, the datasets have the same number of instances, and we use the same sets as in Luo et al. (2023), where solutions come from HGS (Vidal, 2022). The vehicle capacities for 100, 200, 500, and 1000 nodes are 50, 80, 100, and 250, respectively. ... All random instances are generated in the standard manner of Taillard (1993) with integer processing times in [1, 99].
Dataset Splits	Yes	The training dataset consists of one million randomly generated instances with N = 100 nodes for TSP. The validation and test sets (also N = 100) include 10,000 instances each. The test set is the same widely used dataset generated by Kool et al. (2019b). For N {200, 500, 1000}, we use a test set of 128 instances identical to Drakulic et al. (2023). ... we pre-generate a validation set of 100 instances of size 20 20.
Hardware Specification	Yes	Training and inference are performed for all experiments using two NVIDIA RTX A5000 with 24GB memory.
Software Dependencies	No	Our code is developed in PyTorch (Paszke et al., 2019).
Experiment Setup	Yes	Supervised training is performed with the best sampled solutions as in its supervised counterpart; however, we use an initial learning rate of 2e-4. ... In each epoch, we train on 1,000 batches consisting of 1,024 sampled subtours. We use Adam (Kingma & Ba, 2014) as an optimizer, with an initial learning rate of 1e-4 and no decay. Gradients are clipped to unit L2-norm. ... In each epoch, we randomly pick a J M size in {10 15, 15 15, 15 20}. We generate 512 random instances of the chosen size for which we sample 128 solutions with GD using a beam width of k = 32 in n = 4 rounds. We use a ﬁxed advantage step size of σ = 0.05. We start with pmin = 1 and set pmin = 0.95 after 50 epochs.