Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Single-Swap Local Search Algorithm for k-Means of Lines

Authors: Ting Liang, Xiaoliang Wu, Junyu Huang, Jianxin Wang, Qilong Feng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments In this section, we give empirical evaluations on the performances of our proposed algorithms. All algorithms are implemented and executed in Python. The experiments were done on a machine with i7-14700KF processor and 256GB RAM. Datasets. We evaluate the performance of our algorithm on both synthetic datasets (SYN 1 with n = 5000, d = 10, and SYN2 with n = 10000, d = 5) and real-world Open Street Map datasets (RE 1 with n = 476, d = 2, and RE 2 with n = 418, d = 2) as used in [20]. For each dataset, we run both algorithms 10 times and report the minimum cost (Min_Cost), maximum cost (Max_Cost), average cost (Avg_Cost), standard deviation (Std), and runtime (Time(s)). Table 1: Experimental results of our SLS-k-ML algorithm and the coreset-based method.
Researcher Affiliation	Academia	Ting Liang1, Xiaoliang Wu1, Junyu Huang1, Jianxin Wang1, Qilong Feng1, 1School of Computer Science and Engineering, Central South University 2The Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha 410083, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: SLS-k-ML Input: An instance (L, d, k) of the k-ML problem and a parameter T Output: A set C Rd of at most k centers 1 P CENTROID-SET(L); 2 C Sample k points from P randomly; 3 for i {1, 2, . . . , T} do 4 Sample two lines ℓ1, ℓ2 from L with probability bℓ= ({ℓ},C) P ℓ L ({ℓ },C) (ℓ= {ℓ1, ℓ2}); 5 M the point set returned by Cross Line of (ℓ1, ℓ2); 6 for each point p M do 7 if q C, s.t. (L, C\{q} {p}) < (L, C) then 8 C C\{q} {p}; 9 return C. Algorithm 2: CENTROID-SET Input: A finite set L of n lines in Rd Output: A set P of points 2 for each line ℓ L do 4 for each line ℓ L\{ℓ} do 5 πℓ (ℓ) the closest point on ℓto ℓ ; 6 Pℓ Pℓ {πℓ (ℓ)}; 8 return P.
Open Source Code	No	The codes (including dataset generation and experimental code) are available upon request via email.
Open Datasets	Yes	Datasets. We evaluate the performance of our algorithm on both synthetic datasets (SYN 1 with n = 5000, d = 10, and SYN2 with n = 10000, d = 5) and real-world Open Street Map datasets (RE 1 with n = 476, d = 2, and RE 2 with n = 418, d = 2) as used in [20].
Dataset Splits	No	The paper mentions evaluating on synthetic and real-world datasets but does not explicitly provide details about training/test/validation splits, proportions, or methodologies for partitioning the data. It only states running algorithms 10 times for each dataset.
Hardware Specification	Yes	The experiments were done on a machine with i7-14700KF processor and 256GB RAM.
Software Dependencies	No	All algorithms are implemented and executed in Python. No specific version of Python or any other libraries are mentioned.
Experiment Setup	Yes	For our SLS-k-ML algorithm, we design a sampling strategy that selects a subset of 100 points from r-Cross Line to improve computational efficiency. Following the settings in [14], we set the number of sampling rounds to T = 400, and the number of clusters to k = 3 and k = 10.