Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Single-Swap Local Search Algorithm for k-Means of Lines
Authors: Ting Liang, Xiaoliang Wu, Junyu Huang, Jianxin Wang, Qilong Feng
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments In this section, we give empirical evaluations on the performances of our proposed algorithms. All algorithms are implemented and executed in Python. The experiments were done on a machine with i7-14700KF processor and 256GB RAM. Datasets. We evaluate the performance of our algorithm on both synthetic datasets (SYN 1 with n = 5000, d = 10, and SYN2 with n = 10000, d = 5) and real-world Open Street Map datasets (RE 1 with n = 476, d = 2, and RE 2 with n = 418, d = 2) as used in [20]. For each dataset, we run both algorithms 10 times and report the minimum cost (Min_Cost), maximum cost (Max_Cost), average cost (Avg_Cost), standard deviation (Std), and runtime (Time(s)). Table 1: Experimental results of our SLS-k-ML algorithm and the coreset-based method. |
| Researcher Affiliation | Academia | Ting Liang1, Xiaoliang Wu1, Junyu Huang1, Jianxin Wang1, Qilong Feng1, 1School of Computer Science and Engineering, Central South University 2The Hunan Provincial Key Lab of Bioinformatics, Central South University, Changsha 410083, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: SLS-k-ML Input: An instance (L, d, k) of the k-ML problem and a parameter T Output: A set C Rd of at most k centers 1 P CENTROID-SET(L); 2 C Sample k points from P randomly; 3 for i {1, 2, . . . , T} do 4 Sample two lines β1, β2 from L with probability bβ= ({β},C) P β L ({β },C) (β= {β1, β2}); 5 M the point set returned by Cross Line of (β1, β2); 6 for each point p M do 7 if q C, s.t. (L, C\{q} {p}) < (L, C) then 8 C C\{q} {p}; 9 return C. Algorithm 2: CENTROID-SET Input: A finite set L of n lines in Rd Output: A set P of points 2 for each line β L do 4 for each line β L\{β} do 5 Οβ (β) the closest point on βto β ; 6 Pβ Pβ {Οβ (β)}; 8 return P. |
| Open Source Code | No | The codes (including dataset generation and experimental code) are available upon request via email. |
| Open Datasets | Yes | Datasets. We evaluate the performance of our algorithm on both synthetic datasets (SYN 1 with n = 5000, d = 10, and SYN2 with n = 10000, d = 5) and real-world Open Street Map datasets (RE 1 with n = 476, d = 2, and RE 2 with n = 418, d = 2) as used in [20]. |
| Dataset Splits | No | The paper mentions evaluating on synthetic and real-world datasets but does not explicitly provide details about training/test/validation splits, proportions, or methodologies for partitioning the data. It only states running algorithms 10 times for each dataset. |
| Hardware Specification | Yes | The experiments were done on a machine with i7-14700KF processor and 256GB RAM. |
| Software Dependencies | No | All algorithms are implemented and executed in Python. No specific version of Python or any other libraries are mentioned. |
| Experiment Setup | Yes | For our SLS-k-ML algorithm, we design a sampling strategy that selects a subset of 100 points from r-Cross Line to improve computational efficiency. Following the settings in [14], we set the number of sampling rounds to T = 400, and the number of clusters to k = 3 and k = 10. |