Coresets for Near-Convex Functions
Authors: Murad Tukan, Alaa Maalouf, Dan Feldman
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7 Experimental Results: In what follows we evaluate our coreset against uniform sampling on real-world datasets, with respect to the SVM problem, Logistic regression problem and ℓz-regression problem for z (0, 1). |
| Researcher Affiliation | Academia | Murad Tukan muradtuk@gmail.com Alaa Maalouf alaamalouf12@gmail.com Dan Feldman dannyf.post@gmail.com The Robotics and Big Data Lab, Department of Computer Science, University of Haifa, Haifa, Israel |
| Pseudocode | Yes | Algorithm 1: CORESET(P, f, m) Input: A set P Rd of n points, a near-convex loss function f : P Rd [0, ), and a sample size m 1. Output: A pair (S, v) that satisfies Theorem 6. |
| Open Source Code | Yes | (v) An open source code implementation of our algorithm, for reproducing our results and future research [61]. |
| Open Datasets | Yes | Datasets. The following datasets were used for our experiments mostly from UCI machine learning repository [22]: (i) HTRU [22] 17, 898 radio emissions of the Pulsar star each consisting of 9 features. (ii) Skin [22] 245, 057 random samples of R,G,B from face images consisting of 4 dimensions. (iii) Cod-rna [62] consists of 59, 535 samples, 8 features, which has two classes (i.e. labels), describing RNAs. (iv) Web dataset [9] 49, 749 web pages records where each record is consists of 300 features. (v) 3D spatial networks [22] 3D road network with highly accurate elevation information (+-20cm) from Denmark used in eco-routing and fuel/Co2-estimation routing algorithms consisting of 434, 874 records where each record has 4 features. |
| Dataset Splits | No | The paper mentions 'sample sizes' for coreset generation and '40 trials' for averaging results, but does not explicitly describe the train/validation/test splits for the input datasets themselves. |
| Hardware Specification | Yes | Software/Hardware. Our algorithms were implemented in Python 3.6 [63] using Numpy [48], Scipy [64] and Scikit-learn [49]. Tests were performed on 2.59GHz i7-6500U (2 cores total) machine with 16GB RAM. |
| Software Dependencies | Yes | Software/Hardware. Our algorithms were implemented in Python 3.6 [63] using Numpy [48], Scipy [64] and Scikit-learn [49]. |
| Experiment Setup | Yes | At Fig. 2a 2f, we have chosen 20 sample sizes, starting from 50 till 500, at Figures 2g 2h, we have chosen 20 sample sizes starting from 4000 till 16, 000. At each sample size, we generate two coresets, where the first is using uniform sampling and the latter is using Algorithm 1. For each coreset (S, v), we find x arg minx Rd P p S v(p)f(p, x), and the approximation error ε is set to be (P p P f (p, x ))/(minx Rd P p P f(p, x)) 1. The results were averaged across 40 trials, while the shaded regions correspond to the standard deviation. |