Fast Algorithms for Segmented Regression
Authors: Jayadev Acharya, Ilias Diakonikolas, Jerry Li, Ludwig Schmidt
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation shows that, compared with the DP approach, our algorithms provide a convergence rate that is only off by a factor of 2 to 4, while achieving speedups of three orders of magnitude. ... Our algorithms run in time that is nearly-linear in the number of data points n and the number of intervals k. Their computational efficiency is established both theoretically and experimentally. ... This gap also manifests itself strongly in our experiments (see Section 4). ... When given the same number of samples, our MSE is a factor of 2-4 times worse than that achieved by the DP, but our algorithm is about 1,000 times faster already for 10^4 data points. ... In addition to our theoretical analysis above, we also study the empirical performance of our new estimator for segmented regression on both real and synthetic data. As baseline, we compare our estimator (GREEDYMERGING) to the dynamic programming approach. ... Figure 1 shows the results of the merging algorithm and the exact dynamic program for sample size n ranging from 10^2 to 10^4. |
| Researcher Affiliation | Academia | Jayadev Acharya JAYADEV@MIT.EDU Massachusetts Institute of Technology, Cambridge, MA 02139, USA Ilias Diakonikolas DIAKONIK@USC.EDU University of Southern California, Los Angeles, MA 90089, USA Jerry Li JERRYZLI@MIT.EDU Ludwig Schmidt LUDWIGS@MIT.EDU Massachusetts Institute of Technology, Cambridge, MA 02139, USA |
| Pseudocode | Yes | In this document, we only give the formal pseudo code and proofs for GREEDYMERGE. We refer the reader to the full version for more details. ... Algorithm 1 Piecewise linear regression by greedy merging. |
| Open Source Code | No | The paper mentions using the Julia programming language for experiments but does not provide a link to the source code for the algorithms or methodology described. |
| Open Datasets | No | The paper describes generating synthetic data and using 'a time series of the Dow Jones index as input' but does not provide concrete access information (e.g., URL, DOI, specific citation with authors/year) for any publicly available or open dataset. |
| Dataset Splits | No | The paper describes data generation and sample sizes but does not specify any training, validation, or test dataset splits, percentages, or methodology for data partitioning needed for reproduction. |
| Hardware Specification | Yes | All experiments were conducted on a laptop computer with a 2.8 GHz Intel Core i7 CPU and 16 GB of RAM. |
| Software Dependencies | Yes | we use the Julia programming language1 (version 0.4.2) |
| Experiment Setup | Yes | We generate the piecewise constant function f by randomly choosing k = 10 integers from the set {1, . . . , 10} as function value in each of the k segments. Then we draw n/k samples from each segment by adding an i.i.d. Gaussian noise term with variance 1 to each sample. ... For the piecewise linear case, we generate a n d data matrix X with i.i.d. Gaussian entries (d = 10). In each segment I, we choose the parameter values βI independently and uniformly at random from the interval [ 1, 1]. ... Since the merging algorithm can produce a variable number of output segments, we run the merging algorithm with three different parameter settings corresponding to k, 2k, and 4k output segments, respectively. |