The Space Complexity of Approximating Logistic Loss

Authors: Gregory Dexter, Petros Drineas, Rajiv Khanna

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical evidence verifying the algorithm of Section 3 to exactly computes the classification complexity measure µy(X) of Definition 1. We compare our results with the approximate sketching-based calculation of Munteanu et al. [10]. In order to estimate µy(X) for a dataset using the sketching-based approach of Munteanu et al. [10], we create several smaller sketched datasets of a given full dataset of size n d (n rows and d columns).
Researcher Affiliation Collaboration Gregory Dexter Linked In Corporation gdexter@linkedin.com Petros Drineas Department of Computer Science Purdue University pdrineas@purdue.edu Rajiv Khanna Department of Computer Science Purdue University rajivak@purdue.edu
Pseudocode No The paper discusses linear programming formulations but does not present any pseudocode or algorithm blocks.
Open Source Code Yes Code and instructions to reproduce our results are provided in the supplementary material.
Open Datasets Yes Real data experiments: To test our setup on real data, we make use of the sklearn KDDcup dataset.2 The dataset consists of 100654 data points and over 50 features. ... 2sklearn.datasets.fetch_kddcup99() provides an API to access this dataset.
Dataset Splits No We do not provide a learning algorithm, so there are no data splits or training.
Hardware Specification No Our focus is primarily theoretical, and our experiments do not require significant compute resources.
Software Dependencies No In order to solve the modified linear programs, we make use of the OR-tools1. ... We set the upper bound by multiplying the lower bound by d log(d/δ). Note that this upper bound is conservative, and the actual upper bound could be a constant factor higher, since the guarantees of Munteanu et al. [10] are only up to a constant factor. The presented results are an average over 20 runs of different sketches ... 1https://developers.google.com/optimization
Experiment Setup Yes Synthetic data: We create the synthetic dataset as follows. First, we construct the full data matrix X Rn d by drawing n = 10,000 samples from the d-dimensional Gaussian distribution N(0, Id) with d = 100. We generate an arbitrary β Rd with β 2 = 1 and generate the posterior p(yi|xi) = 1/(1 + exp( β xi + N(0, 1))). Finally, we create the labels yi for all i = 1 . . . n by setting yi to one if p(yi|xi) > 0.5 and to 1 otherwise. ... We choose δ so that n {512, 1024, 2048, 4096, 8192}. ... Real data experiments: ... We only use continuous features which reduces the feature size to 33. The dataset contains 3377 positive data points, while the rest are negative. To create various sized subsets for exact calculation, we subsample from positives and negatives so that they are in about equal proportion.