reproducibilityindex.ai

On the Optimal Memorization Power of ReLU Neural Networks

Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We show that such networks can memorize any N points that satisfy a mild separability assumption using O N parameters. Known VC-dimension upper bounds imply that memorizing N samples requires Ω( N) parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by 1 L N, for memorizing N samples using O(N/L) parameters. This bound is also optimal up to logarithmic factors. Our construction uses weights with large bit complexity. We prove that having such a large bit complexity is both necessary and sufﬁcient for memorization with a sub-linear number of parameters.
Researcher Affiliation	Academia	Gal Vardi , Gilad Yehudai* & Ohad Shamir Department of Computer Science Weizmann Institute of Science {gal.vardi,gilad.yehudai,ohad.shamir}@weizmann.ac.il
Pseudocode	No	The paper describes the construction and components of the neural network mathematically and textually (e.g., "The construction is done in three phases.", "We construct the following neural network Fi:") but does not include any formal pseudocode blocks or algorithm listings.
Open Source Code	No	The paper does not contain any statements about releasing code, nor does it provide links to any code repositories or supplementary materials containing code.
Open Datasets	No	The paper refers to abstract "N labeled samples (x1, y1), . . . , (x N, y N)" with certain properties (e.g., "xi r for every i and xi xj δ for every i = j"), but it does not mention or use any specific, named datasets (public or otherwise), nor does it provide any links or citations to datasets.
Dataset Splits	No	As a theoretical paper, it does not involve empirical experiments with datasets. Therefore, there is no mention of training, validation, or test splits for any dataset.
Hardware Specification	No	The paper is theoretical and focuses on mathematical properties of neural networks. It does not mention any specific hardware (like GPUs, CPUs, or TPUs) used for computations or experiments.
Software Dependencies	No	The paper describes theoretical constructions and proofs. It does not mention any specific software or programming language dependencies with version numbers that would be needed to reproduce any empirical results.
Experiment Setup	No	The paper is theoretical and does not describe any empirical experiments or their setup. There is no mention of hyperparameters, training configurations, or other experimental settings.