reproducibilityindex.ai

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

Authors: Eran Malach, Pritish Kamath, Emmanuel Abbe, Nathan Srebro

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical Demonstration in Two-Layer Networks. While for ease of analysis we presented a fairly speciﬁc model, with many ﬁxed (non-trainable) weights and only few trainable weights, we expect the same behaviour occurs also in more natural, but harder to analyze models. To verify this, we trained a two-layer fully-connected Re LU network on the source distribution Dα analyzed above, for n = 128 and k = 7. We observe that indeed when α > 0, and thus a linear predictor has at least some edge, gradient descent training succeeds in learning the sparse parity, while the best predictor in the Tangent Kernel cannot get error much better than 0.5. See Figure 2 for details.
Researcher Affiliation	Academia	1Hebrew University of Jerusalem, Israel 2Toyota Technological Institute at Chicago, USA 3EPFL, Switzerland.
Pseudocode	No	The paper describes methods using mathematical formulas and prose but does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	No	The paper describes a synthetic data generation process ('data sampled from DI with n = 128, k = 7') for its experiments but does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available or open dataset.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions the use of 'Adam optimizer' but does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup	Yes	Figure 2 states 'trained using Adam optimizer with learningrate of 0.01'. Claim 1 specifies 'accuracy τ α/2k, step size η = 2k/(αn) and T = 1 step'. Claim 6 specifies 'accuracy τ = 4/3α, step size η = 1 and T = 1 step'. The initialization `θ0 = 0` is also mentioned.