Learning from Label Proportions: A Mutual Contamination Framework

Authors: Clayton Scott, Jianxin Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments
Researcher Affiliation Academia Clayton Scott and Jianxin Zhang Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109 {clayscot,jianxinz}@umich.edu
Pseudocode Yes Algorithm 1 Plug-in approach to LLP via LMMCM (outline)
Open Source Code Yes 2https://github.com/Z-Jianxin/Learning-from-Label-Proportions-A-Mutual-Contamination-Framework
Open Datasets Yes We consider the Adult (T = 8192) and MAGIC Gamma Ray Telescope (T = 6144) datasets (both available from the UCI repository3)
Dataset Splits Yes the parameter λ {1, 10 1, 10 2, . . . , 10 5} is chosen by 5-fold cross validation.
Hardware Specification No For each dataset, our implementation runs all 8 settings in roughly 50 minutes using 48 cores.
Software Dependencies No Our Python implementation uses Sci Py s L-BFGS routine to find the optimal αi.
Experiment Setup Yes We implement a method based on our general approach (see Algorithm 1) by taking ℓto be the logistic loss, F to be the RKHS associated to a Gaussian kernel k, and selecting f F by minimizing b Ew(f) + λ f 2 F. ... The kernel parameter is computed by 1 d V ar(X) where d is the number of features and V ar(X) is the variance of the data matrix, and the parameter λ {1, 10 1, 10 2, . . . , 10 5} is chosen by 5-fold cross validation.