k-Means Clustering of Lines for Big Data
Authors: Yair Marom, Dan Feldman
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on 10 machines on Amazon EC2 cloud show that the algorithm performs well in practice. Open source code for all the algorithms and experiments is also provided. Section 4: Experimental Results. |
| Researcher Affiliation | Academia | Yair Marom Department of Computer Science University of Haifa Haifa, Israel yairmrm@gmail.com Dan Feldman Department of Computer Science University of Haifa Haifa, Israel dannyf.post@gmail.com |
| Pseudocode | Yes | Algorithm 1: CENTROID-SET(L) Algorithm 2: BI-CRITERIA-APPROXIMATION(L, m) Algorithm 3: LINES-SENSITIVITY(L, b, k) Algorithm 4: CORESET(L, k, m) |
| Open Source Code | Yes | Open source code for all the algorithms and experiments is also provided. [1] Github. https://github.com/Yair Marom/k_lines_means, 2019. |
| Open Datasets | Yes | We evaluate our system on two types of data sets: synthetic data generated with carefully controlled parameters, and a real data of roads map from the "Open Street Map" Dataset [13] and "Simple Home XCS7 1002 WHT Security Camera" from the the "UCI Machine Learning Repository" Dataset [4]. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages or sample counts). It mentions using a 'sample size' for coreset construction, which is different from standard dataset splits for model training/evaluation. |
| Hardware Specification | No | The paper mentions running experiments "on 10 machines on Amazon EC2 cloud" but does not specify any particular hardware models (e.g., CPU, GPU, or instance types) used for the experiments. |
| Software Dependencies | No | We implemented our coreset construction from Algorithm 4 and its sub-procedures in Python V. 3.6. We make use of the MKL package [26] to improve its performance. The paper mentions "Python V. 3.6" (a language version) and "MKL package" (a library name), but it does not provide version numbers for any specific key software libraries or dependencies, other than Python itself. According to the criteria, this is insufficient as it doesn't provide versioned libraries/solvers. |
| Experiment Setup | No | The paper discusses the sample size 'm' used for coreset construction (e.g., "m = 700 lines"), which is a parameter of their method. However, it does not provide typical experimental setup details such as learning rates, batch sizes, optimizers, or other system-level training configurations common in machine learning papers. |