reproducibilityindex.ai

Coresets for k-Segmentation of Streaming Data

Authors: Guy Rosman, Mikhail Volkov, Dan Feldman, John W. Fisher III, Daniela Rus

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our algorithms on very large synthetic and real data sets from GPS, video and ﬁnancial domains, using 255 machines in Amazon cloud. 3 Experimental Results We now demonstrate the results of our algorithm on four data types of varying length and dimensionality. We compare our algorithms against several other segmentation algorithms. We also show that the coreset effectively improves the performance of several segmentation algorithms by running the algorithms on our coreset instead of the full data.
Researcher Affiliation	Academia	Guy Rosman CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA rosman@csail.mit.edu Mikhail Volkov CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA mikhail@csail.mit.edu Danny Feldman CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA dannyf@csail.mit.edu John W. Fisher III CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA ﬁsher@csail.mit.edu Daniela Rus CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA rus@csail.mit.edu
Pseudocode	Yes	Algorithm 1: BICRITERIA(P, k) and Algorithm 2: BALANCEDPARTITION(P, ε, σ)
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. It mentions 'Our implementation summarizes the video in less than 20 minutes', but no specific repository link, explicit code release statement, or indication of code in supplementary materials.
Open Datasets	Yes	We used color-augmented SURF features, quantized into 5000 visual words, trained on the Image Net 2013 dataset [7].
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	Yes	We demonstrate scalability by conducting very large scale experiments on both real and synthetic data, running our algorithm on a network of 255 Amazon EC2 v CPU nodes. A parallel version achieves 30Hz on a single i7 machine
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4). It mentions concepts like 'color-augmented SURF features' but no software dependencies with versions.
Experiment Setup	Yes	Algorithm 2: BALANCEDPARTITION(P, ε, σ) Input: A set P = {(1, p1), , (n, pn)} in Rd+1 an error parameters ε (0, 1/10) and σ > 0. We generate synthetic test data by drawing a discrete k-segment P with k = 20, and then add Gaussian and salt-and-pepper noise. For a fair comparison between the (k, ε)-coreset D and the corresponding approximations U, R we allow the same number of coefﬁcients for each approximation.