Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Coreset for Line-Sets Clustering

Authors: Sagi Lotan, Ernesto Evgeniy Sanches Shayda, Dan Feldman

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented our coreset construction algorithms for both Colored-sets and Line-sets. In this section we test there empirical performance on synthetic and real data. Open source code and experiments are also provided.
Researcher Affiliation	Academia	Sagi Lotan EMAIL Ernesto Evgeniy Sanches Shayda EMAIL Dan Feldman EMAIL Robotics & Big Data Labs, Computer Science Department, University of Haifa
Pseudocode	Yes	Algorithm 1: CS-DENSE(P, k), Algorithm 2: GROUPED-SENSITIVITY(L, B, k), Algorithm 3: LS-DENSE(L, k), Algorithm 4: CORESET(L, k, η)
Open Source Code	Yes	Open source code and experiments are also provided. Open source is available in [32]. [32] S. Lotan, E. E. S. Shayda, and D. Feldman. Coreset for lines sets clustering open source code. https://github.com/ernestosanches/Sets-of-lines-clustering, 2022.
Open Datasets	Yes	We used California housing prices data-set [37] in witch we introduced uncertainty by removing two of the 9 dimensions each point. [37] R. K. Pace and R. Barry. California housing dataset. https://scikit-learn.org/stable/ modules/generated/sklearn.datasets.fetch_california_housing.html, 1997. The Reuters-21578 data-set from [30], which results in sets of points corresponding to each paragraph in each document. [30] D. Lewis. Reuters-21578 text categorization test collection, distribution 1.0. http: // www. research. att. com , 1997.
Dataset Splits	No	The paper describes the datasets used for experiments but does not provide specific details on train/validation/test splits, such as percentages or sample counts, nor does it refer to standard splits with citations.
Hardware Specification	No	The paper does not provide any specific details about the hardware specifications (e.g., GPU/CPU models, memory) used for conducting the experiments.
Software Dependencies	No	The paper states 'We implemented our coreset construction algorithms' and mentions open-source code availability, but it does not specify any software dependencies or library versions within the text itself.
Experiment Setup	No	The paper describes the general experimental procedure, including varying 'm' and 'k' and generating queries, but it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, epochs) or training configurations.