Learnability of Linear Thresholds from Label Proportions
Authors: Rishi Saket
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 2.2 Experimental Evaluation We evaluate Algorithm A (Fig. 1) on synthetically generated satisfiable LLP-LTF instances over dimension d {10, 40, 100} and number of bags m {50, 100, 200}. For each (d, m) the algorithm solves 100 independently generated instances each by choosing a random satisfying linear form f (with iid N(0, 1) coefficients) and m bags independently, for each bag sampling two points iid from N(0, 1)d with their label proportions given by pos(f(x)). Note that the bags and their constituent points are not correlated with f. For each instance the SDP in (2)-(4) is solved and the best h chosen using 100 independent Gaussians g. With m0 and m1 denoting the number of monochromatic and non-monochromatic bags respectively, the theoretical performance threshold is given by t := (m0/4 + m1/2) (Lemma 2.2). We measure s as the number of bags satisfied by pos(h ) and the average and minimum values of (s/t) over the 100 instances per (d, m), both of which are quite a bit larger than 1, showing that the algorithm performs much better than than the theoretical guarantee on such instances. We also compute r as the number of bags satisfied by the best {pos(h ), pos( h )} out of 100 random linear forms h , and the average value of (r/s). The latter is below 1 in all cases indicating that our algorithm performs better on average than the random linear threshold with the gap being higher when d is smaller. The results are presented in Table 1. |
| Researcher Affiliation | Industry | Rishi Saket Google Research, India rishisaket@google.com |
| Pseudocode | Yes | Figure 1: Algorithm A for LLP-LTF. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | No | We evaluate Algorithm A (Fig. 1) on synthetically generated satisfiable LLP-LTF instances over dimension d {10, 40, 100} and number of bags m {50, 100, 200}. The paper does not provide access information (link, citation, or repository) for this synthetic data. |
| Dataset Splits | No | The paper describes experiments on 'synthetically generated satisfiable LLP-LTF instances' and runs '100 independently generated instances' for each parameter set, but it does not specify explicit training, validation, and test splits for a shared dataset. |
| Hardware Specification | No | The paper describes experimental evaluation in Section 2.2 but does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions solving a Semi-Definite Program (SDP) and sampling from Gaussian distributions, but it does not provide specific software dependencies or their version numbers, such as programming languages, libraries, or solvers used for implementation or experiments. |
| Experiment Setup | Yes | We evaluate Algorithm A (Fig. 1) on synthetically generated satisfiable LLP-LTF instances over dimension d {10, 40, 100} and number of bags m {50, 100, 200}. For each (d, m) the algorithm solves 100 independently generated instances each by choosing a random satisfying linear form f (with iid N(0, 1) coefficients) and m bags independently, for each bag sampling two points iid from N(0, 1)d with their label proportions given by pos(f(x)). ... For each instance the SDP in (2)-(4) is solved and the best h chosen using 100 independent Gaussians g. |