Learnability of high-dimensional targets by two-parameter models and gradient flow

Authors: Dmitry Yarotsky

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We explore the theoretical possibility of learning d-dimensional targets with Wparameter models by gradient flow (GF) when W < d. Our main result shows that if the targets are described by a particular d-dimensional probability distribution, then there exist models with as few as two parameters that can learn the targets with arbitrarily high success probability. On the other hand, we show that for W < d there is necessarily a large subset of GF-non-learnable targets. In particular, the set of learnable targets is not dense in Rd, and any subset of Rd homeomorphic to the W-dimensional sphere contains non-learnable targets. Finally, we observe that the model in our main theorem on almost guaranteed two-parameter learning is constructed using a hierarchical procedure and as a result is not expressible by a single elementary function. We show that this limitation is essential in the sense that most models written in terms of elementary functions cannot achieve the learnability demonstrated in this theorem.
Researcher Affiliation Academia Dmitry Yarotsky Skoltech d.yarotsky@skoltech.ru
Pseudocode No The paper describes mathematical constructions and proofs in prose and equations, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper is purely theoretical and does not mention releasing any source code for its methodology. The NeurIPS checklist further confirms that the paper does not include experiments, implying no code release.
Open Datasets No The paper is theoretical and does not conduct experiments involving specific training datasets. It refers to abstract 'target spaces' and 'probability distributions' (e.g., 'ยต be any Borel probability measure on H'), but these are theoretical constructs, not concrete datasets used for empirical evaluation.
Dataset Splits No The paper is purely theoretical and does not involve empirical evaluation with dataset splits (training, validation, or testing).
Hardware Specification No The paper is purely theoretical and does not describe any empirical experiments, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is purely theoretical and does not describe any empirical experiments, therefore no software dependencies with version numbers are mentioned.
Experiment Setup No The paper is purely theoretical and does not include any empirical experiments. As such, there are no details regarding experimental setup, hyperparameters, or system-level training settings.