reproducibilityindex.ai

Learnability of Linear Thresholds from Label Proportions

Authors: Rishi Saket

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	2.2 Experimental Evaluation We evaluate Algorithm A (Fig. 1) on synthetically generated satisﬁable LLP-LTF instances over dimension d {10, 40, 100} and number of bags m {50, 100, 200}. For each (d, m) the algorithm solves 100 independently generated instances each by choosing a random satisfying linear form f (with iid N(0, 1) coefﬁcients) and m bags independently, for each bag sampling two points iid from N(0, 1)d with their label proportions given by pos(f(x)). Note that the bags and their constituent points are not correlated with f. For each instance the SDP in (2)-(4) is solved and the best h chosen using 100 independent Gaussians g. With m0 and m1 denoting the number of monochromatic and non-monochromatic bags respectively, the theoretical performance threshold is given by t := (m0/4 + m1/2) (Lemma 2.2). We measure s as the number of bags satisﬁed by pos(h ) and the average and minimum values of (s/t) over the 100 instances per (d, m), both of which are quite a bit larger than 1, showing that the algorithm performs much better than than the theoretical guarantee on such instances. We also compute r as the number of bags satisﬁed by the best {pos(h ), pos( h )} out of 100 random linear forms h , and the average value of (r/s). The latter is below 1 in all cases indicating that our algorithm performs better on average than the random linear threshold with the gap being higher when d is smaller. The results are presented in Table 1.
Researcher Affiliation	Industry	Rishi Saket Google Research, India rishisaket@google.com
Pseudocode	Yes	Figure 1: Algorithm A for LLP-LTF.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for their methodology is publicly available.
Open Datasets	No	We evaluate Algorithm A (Fig. 1) on synthetically generated satisﬁable LLP-LTF instances over dimension d {10, 40, 100} and number of bags m {50, 100, 200}. The paper does not provide access information (link, citation, or repository) for this synthetic data.
Dataset Splits	No	The paper describes experiments on 'synthetically generated satisﬁable LLP-LTF instances' and runs '100 independently generated instances' for each parameter set, but it does not specify explicit training, validation, and test splits for a shared dataset.
Hardware Specification	No	The paper describes experimental evaluation in Section 2.2 but does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions solving a Semi-Deﬁnite Program (SDP) and sampling from Gaussian distributions, but it does not provide specific software dependencies or their version numbers, such as programming languages, libraries, or solvers used for implementation or experiments.
Experiment Setup	Yes	We evaluate Algorithm A (Fig. 1) on synthetically generated satisﬁable LLP-LTF instances over dimension d {10, 40, 100} and number of bags m {50, 100, 200}. For each (d, m) the algorithm solves 100 independently generated instances each by choosing a random satisfying linear form f (with iid N(0, 1) coefﬁcients) and m bags independently, for each bag sampling two points iid from N(0, 1)d with their label proportions given by pos(f(x)). ... For each instance the SDP in (2)-(4) is solved and the best h chosen using 100 independent Gaussians g.