On the Optimal Memorization Power of ReLU Neural Networks
Authors: Gal Vardi, Gilad Yehudai, Ohad Shamir
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We show that such networks can memorize any N points that satisfy a mild separability assumption using O N parameters. Known VC-dimension upper bounds imply that memorizing N samples requires Ω( N) parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by 1 L N, for memorizing N samples using O(N/L) parameters. This bound is also optimal up to logarithmic factors. Our construction uses weights with large bit complexity. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters. |
| Researcher Affiliation | Academia | Gal Vardi , Gilad Yehudai* & Ohad Shamir Department of Computer Science Weizmann Institute of Science {gal.vardi,gilad.yehudai,ohad.shamir}@weizmann.ac.il |
| Pseudocode | No | The paper describes the construction and components of the neural network mathematically and textually (e.g., "The construction is done in three phases.", "We construct the following neural network Fi:") but does not include any formal pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper does not contain any statements about releasing code, nor does it provide links to any code repositories or supplementary materials containing code. |
| Open Datasets | No | The paper refers to abstract "N labeled samples (x1, y1), . . . , (x N, y N)" with certain properties (e.g., "xi r for every i and xi xj δ for every i = j"), but it does not mention or use any specific, named datasets (public or otherwise), nor does it provide any links or citations to datasets. |
| Dataset Splits | No | As a theoretical paper, it does not involve empirical experiments with datasets. Therefore, there is no mention of training, validation, or test splits for any dataset. |
| Hardware Specification | No | The paper is theoretical and focuses on mathematical properties of neural networks. It does not mention any specific hardware (like GPUs, CPUs, or TPUs) used for computations or experiments. |
| Software Dependencies | No | The paper describes theoretical constructions and proofs. It does not mention any specific software or programming language dependencies with version numbers that would be needed to reproduce any empirical results. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experiments or their setup. There is no mention of hyperparameters, training configurations, or other experimental settings. |