On the Optimal Memorization Power of ReLU Neural Networks

Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We show that such networks can memorize any N points that satisfy a mild separability assumption using O N parameters. Known VC-dimension upper bounds imply that memorizing N samples requires Ω( N) parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by 1 L N, for memorizing N samples using O(N/L) parameters. This bound is also optimal up to logarithmic factors. Our construction uses weights with large bit complexity. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.
Researcher Affiliation Academia Gal Vardi , Gilad Yehudai* & Ohad Shamir Department of Computer Science Weizmann Institute of Science {gal.vardi,gilad.yehudai,ohad.shamir}@weizmann.ac.il
Pseudocode No The paper describes the construction and components of the neural network mathematically and textually (e.g., "The construction is done in three phases.", "We construct the following neural network Fi:") but does not include any formal pseudocode blocks or algorithm listings.
Open Source Code No The paper does not contain any statements about releasing code, nor does it provide links to any code repositories or supplementary materials containing code.
Open Datasets No The paper refers to abstract "N labeled samples (x1, y1), . . . , (x N, y N)" with certain properties (e.g., "xi r for every i and xi xj δ for every i = j"), but it does not mention or use any specific, named datasets (public or otherwise), nor does it provide any links or citations to datasets.
Dataset Splits No As a theoretical paper, it does not involve empirical experiments with datasets. Therefore, there is no mention of training, validation, or test splits for any dataset.
Hardware Specification No The paper is theoretical and focuses on mathematical properties of neural networks. It does not mention any specific hardware (like GPUs, CPUs, or TPUs) used for computations or experiments.
Software Dependencies No The paper describes theoretical constructions and proofs. It does not mention any specific software or programming language dependencies with version numbers that would be needed to reproduce any empirical results.
Experiment Setup No The paper is theoretical and does not describe any empirical experiments or their setup. There is no mention of hyperparameters, training configurations, or other experimental settings.