reproducibilityindex.ai

k-Means Clustering of Lines for Big Data

Authors: Yair Marom, Dan Feldman

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on 10 machines on Amazon EC2 cloud show that the algorithm performs well in practice. Open source code for all the algorithms and experiments is also provided. Section 4: Experimental Results.
Researcher Affiliation	Academia	Yair Marom Department of Computer Science University of Haifa Haifa, Israel yairmrm@gmail.com Dan Feldman Department of Computer Science University of Haifa Haifa, Israel dannyf.post@gmail.com
Pseudocode	Yes	Algorithm 1: CENTROID-SET(L) Algorithm 2: BI-CRITERIA-APPROXIMATION(L, m) Algorithm 3: LINES-SENSITIVITY(L, b, k) Algorithm 4: CORESET(L, k, m)
Open Source Code	Yes	Open source code for all the algorithms and experiments is also provided. [1] Github. https://github.com/Yair Marom/k_lines_means, 2019.
Open Datasets	Yes	We evaluate our system on two types of data sets: synthetic data generated with carefully controlled parameters, and a real data of roads map from the "Open Street Map" Dataset [13] and "Simple Home XCS7 1002 WHT Security Camera" from the the "UCI Machine Learning Repository" Dataset [4].
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts). It mentions using a 'sample size' for coreset construction, which is different from standard dataset splits for model training/evaluation.
Hardware Specification	No	The paper mentions running experiments "on 10 machines on Amazon EC2 cloud" but does not specify any particular hardware models (e.g., CPU, GPU, or instance types) used for the experiments.
Software Dependencies	No	We implemented our coreset construction from Algorithm 4 and its sub-procedures in Python V. 3.6. We make use of the MKL package [26] to improve its performance. The paper mentions "Python V. 3.6" (a language version) and "MKL package" (a library name), but it does not provide version numbers for any specific key software libraries or dependencies, other than Python itself. According to the criteria, this is insufficient as it doesn't provide versioned libraries/solvers.
Experiment Setup	No	The paper discusses the sample size 'm' used for coreset construction (e.g., "m = 700 lines"), which is a parameter of their method. However, it does not provide typical experimental setup details such as learning rates, batch sizes, optimizers, or other system-level training configurations common in machine learning papers.