Coresets for k-Segmentation of Streaming Data
Authors: Guy Rosman, Mikhail Volkov, Dan Feldman, John W. Fisher III, Daniela Rus
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate our algorithms on very large synthetic and real data sets from GPS, video and financial domains, using 255 machines in Amazon cloud. 3 Experimental Results We now demonstrate the results of our algorithm on four data types of varying length and dimensionality. We compare our algorithms against several other segmentation algorithms. We also show that the coreset effectively improves the performance of several segmentation algorithms by running the algorithms on our coreset instead of the full data. |
| Researcher Affiliation | Academia | Guy Rosman CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA rosman@csail.mit.edu Mikhail Volkov CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA mikhail@csail.mit.edu Danny Feldman CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA dannyf@csail.mit.edu John W. Fisher III CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA fisher@csail.mit.edu Daniela Rus CSAIL, MIT 32 Vassar St., 02139, Cambridge, MA USA rus@csail.mit.edu |
| Pseudocode | Yes | Algorithm 1: BICRITERIA(P, k) and Algorithm 2: BALANCEDPARTITION(P, ε, σ) |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It mentions 'Our implementation summarizes the video in less than 20 minutes', but no specific repository link, explicit code release statement, or indication of code in supplementary materials. |
| Open Datasets | Yes | We used color-augmented SURF features, quantized into 5000 visual words, trained on the Image Net 2013 dataset [7]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | Yes | We demonstrate scalability by conducting very large scale experiments on both real and synthetic data, running our algorithm on a network of 255 Amazon EC2 v CPU nodes. A parallel version achieves 30Hz on a single i7 machine |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4). It mentions concepts like 'color-augmented SURF features' but no software dependencies with versions. |
| Experiment Setup | Yes | Algorithm 2: BALANCEDPARTITION(P, ε, σ) Input: A set P = {(1, p1), , (n, pn)} in Rd+1 an error parameters ε (0, 1/10) and σ > 0. We generate synthetic test data by drawing a discrete k-segment P with k = 20, and then add Gaussian and salt-and-pepper noise. For a fair comparison between the (k, ε)-coreset D and the corresponding approximations U, R we allow the same number of coefficients for each approximation. |