reproducibilityindex.ai

Sample Amplification: Increasing Dataset Size even when Learning is Impossible

Authors: Brian Axelrod, Shivam Garg, Vatsal Sharan, Gregory Valiant

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 2: Performance improvement by ampliﬁcation. See example 2 for a description of the setup. and Example 2. Given labeled examples, (x1, y1), . . . , (xn, yn) drawn from a distribution, D, with xi Rd and yi R, a natural quantity to estimate is the fraction of variance in y explainable as a linear function of x: infθ Rd E(x,y) D[(θT x y)2]. The standard unbiased estimator for this quantity is the training error of the least-squares linear model, scaled by a factor of 1 n d. This scaling factor makes this estimate unbiased, although the variance is large when n is not much larger than d. Figure 2 shows the expected squared error of this estimator on raw samples, and on (n, n + 2) ampliﬁed samples, in the case where xi N(0, Id), and yi = θT xi +η for some model θ 2 = 1 and independent noise η N(0, 1/4) hence the true value for the unexplainable variance is 1/4.
Researcher Affiliation	Academia	1Stanford University, United States. Correspondence to: Brian Axelrod <baxelrod@stanford.edu>, Shivam Garg <shivamgarg@stanford.edu>, Vatsal Sharan <vsharan@cs.stanford.edu>, Gregory Valiant <valiant@stanford.edu>.
Pseudocode	Yes	Algorithm 1 Sample Ampliﬁcation for Gaussian with Unknown Mean and Fixed Covariance
Open Source Code	No	The paper does not contain any statement or link indicating the public release of source code for the described methodology.
Open Datasets	No	The paper describes a 'toy example' where data is generated synthetically ('xi N(0, Id), and yi = θT xi +η for some model θ 2 = 1 and independent noise η N(0, 1/4)'). It does not use or provide access information for a pre-existing publicly available dataset.
Dataset Splits	No	The paper describes a simulated 'toy example' for demonstration purposes and does not specify training, validation, or test dataset splits.
Hardware Specification	No	The paper describes theoretical results and a simulated 'toy example' but does not specify any hardware details (e.g., GPU/CPU models, memory) used for running experiments or simulations.
Software Dependencies	No	The paper does not list any specific software dependencies or their version numbers required to reproduce the experimental setup, focusing instead on theoretical contributions and a conceptual example.
Experiment Setup	Yes	Example 2. Given labeled examples, (x1, y1), . . . , (xn, yn) drawn from a distribution, D, with xi Rd and yi R...in the case where xi N(0, Id), and yi = θT xi +η for some model θ 2 = 1 and independent noise η N(0, 1/4)...Here, the ampliﬁcation procedure draws two additional datapoints, x from the isotropic Gaussian with mean equal to the empirical mean, and labels them according to the learned least-squares regression model ˆθ with independent noise of variance 5/n times the empirical estimate of the unexplained variance.