reproducibilityindex.ai

Coresets for Near-Convex Functions

Authors: Murad Tukan, Alaa Maalouf, Dan Feldman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7 Experimental Results: In what follows we evaluate our coreset against uniform sampling on real-world datasets, with respect to the SVM problem, Logistic regression problem and ℓz-regression problem for z (0, 1).
Researcher Affiliation	Academia	Murad Tukan muradtuk@gmail.com Alaa Maalouf alaamalouf12@gmail.com Dan Feldman dannyf.post@gmail.com The Robotics and Big Data Lab, Department of Computer Science, University of Haifa, Haifa, Israel
Pseudocode	Yes	Algorithm 1: CORESET(P, f, m) Input: A set P Rd of n points, a near-convex loss function f : P Rd [0, ), and a sample size m 1. Output: A pair (S, v) that satisﬁes Theorem 6.
Open Source Code	Yes	(v) An open source code implementation of our algorithm, for reproducing our results and future research [61].
Open Datasets	Yes	Datasets. The following datasets were used for our experiments mostly from UCI machine learning repository [22]: (i) HTRU [22] 17, 898 radio emissions of the Pulsar star each consisting of 9 features. (ii) Skin [22] 245, 057 random samples of R,G,B from face images consisting of 4 dimensions. (iii) Cod-rna [62] consists of 59, 535 samples, 8 features, which has two classes (i.e. labels), describing RNAs. (iv) Web dataset [9] 49, 749 web pages records where each record is consists of 300 features. (v) 3D spatial networks [22] 3D road network with highly accurate elevation information (+-20cm) from Denmark used in eco-routing and fuel/Co2-estimation routing algorithms consisting of 434, 874 records where each record has 4 features.
Dataset Splits	No	The paper mentions 'sample sizes' for coreset generation and '40 trials' for averaging results, but does not explicitly describe the train/validation/test splits for the input datasets themselves.
Hardware Specification	Yes	Software/Hardware. Our algorithms were implemented in Python 3.6 [63] using Numpy [48], Scipy [64] and Scikit-learn [49]. Tests were performed on 2.59GHz i7-6500U (2 cores total) machine with 16GB RAM.
Software Dependencies	Yes	Software/Hardware. Our algorithms were implemented in Python 3.6 [63] using Numpy [48], Scipy [64] and Scikit-learn [49].
Experiment Setup	Yes	At Fig. 2a 2f, we have chosen 20 sample sizes, starting from 50 till 500, at Figures 2g 2h, we have chosen 20 sample sizes starting from 4000 till 16, 000. At each sample size, we generate two coresets, where the ﬁrst is using uniform sampling and the latter is using Algorithm 1. For each coreset (S, v), we ﬁnd x arg minx Rd P p S v(p)f(p, x), and the approximation error ε is set to be (P p P f (p, x ))/(minx Rd P p P f(p, x)) 1. The results were averaged across 40 trials, while the shaded regions correspond to the standard deviation.