Learnability of Linear Thresholds from Label Proportions

Authors: Rishi Saket

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 2.2 Experimental Evaluation We evaluate Algorithm A (Fig. 1) on synthetically generated satisfiable LLP-LTF instances over dimension d {10, 40, 100} and number of bags m {50, 100, 200}. For each (d, m) the algorithm solves 100 independently generated instances each by choosing a random satisfying linear form f (with iid N(0, 1) coefficients) and m bags independently, for each bag sampling two points iid from N(0, 1)d with their label proportions given by pos(f(x)). Note that the bags and their constituent points are not correlated with f. For each instance the SDP in (2)-(4) is solved and the best h chosen using 100 independent Gaussians g. With m0 and m1 denoting the number of monochromatic and non-monochromatic bags respectively, the theoretical performance threshold is given by t := (m0/4 + m1/2) (Lemma 2.2). We measure s as the number of bags satisfied by pos(h ) and the average and minimum values of (s/t) over the 100 instances per (d, m), both of which are quite a bit larger than 1, showing that the algorithm performs much better than than the theoretical guarantee on such instances. We also compute r as the number of bags satisfied by the best {pos(h ), pos( h )} out of 100 random linear forms h , and the average value of (r/s). The latter is below 1 in all cases indicating that our algorithm performs better on average than the random linear threshold with the gap being higher when d is smaller. The results are presented in Table 1.
Researcher Affiliation Industry Rishi Saket Google Research, India rishisaket@google.com
Pseudocode Yes Figure 1: Algorithm A for LLP-LTF.
Open Source Code No The paper does not provide any statement or link indicating that the source code for their methodology is publicly available.
Open Datasets No We evaluate Algorithm A (Fig. 1) on synthetically generated satisfiable LLP-LTF instances over dimension d {10, 40, 100} and number of bags m {50, 100, 200}. The paper does not provide access information (link, citation, or repository) for this synthetic data.
Dataset Splits No The paper describes experiments on 'synthetically generated satisfiable LLP-LTF instances' and runs '100 independently generated instances' for each parameter set, but it does not specify explicit training, validation, and test splits for a shared dataset.
Hardware Specification No The paper describes experimental evaluation in Section 2.2 but does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions solving a Semi-Definite Program (SDP) and sampling from Gaussian distributions, but it does not provide specific software dependencies or their version numbers, such as programming languages, libraries, or solvers used for implementation or experiments.
Experiment Setup Yes We evaluate Algorithm A (Fig. 1) on synthetically generated satisfiable LLP-LTF instances over dimension d {10, 40, 100} and number of bags m {50, 100, 200}. For each (d, m) the algorithm solves 100 independently generated instances each by choosing a random satisfying linear form f (with iid N(0, 1) coefficients) and m bags independently, for each bag sampling two points iid from N(0, 1)d with their label proportions given by pos(f(x)). ... For each instance the SDP in (2)-(4) is solved and the best h chosen using 100 independent Gaussians g.