A New Neural Kernel Regime: The Inductive Bias of Multi-Task Learning
Authors: Julia Nakhleh, Joseph Shenouda, Robert Nowak
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper studies the properties of solutions to multi-task shallow Re LU neural network learning problems, wherein the network is trained to fit a dataset with minimal sum of squared weights. Remarkably, the solutions learned for each individual task resemble those obtained by solving a kernel regression problem, revealing a novel connection between neural networks and kernel methods. It is known that single-task neural network learning problems are equivalent to a minimum norm interpolation problem in a non-Hilbertian Banach space, and that the solutions of such problems are generally non-unique. In contrast, we prove that the solutions to univariate-input, multi-task neural network interpolation problems are almost always unique, and coincide with the solution to a minimum-norm interpolation problem in a Sobolev (Reproducing Kernel) Hilbert Space. We also demonstrate a similar phenomenon in the multivariate-input case; specifically, we show that neural network learning problems with large numbers of tasks are approximately equivalent to an ℓ2 (Hilbert space) minimization problem over a fixed kernel determined by the optimal neurons. |
| Researcher Affiliation | Academia | Julia Nakhleh Department of Computer Science University of Wisconsin-Madison Madison, WI jnakhleh@wisc.edu Joseph Shenouda Department of Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI jshenouda@wisc.edu Robert D. Nowak Department of Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI rdnowak@wisc.edu |
| Pseudocode | No | The paper does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It only states that code for reproducing numerical experiments can be found at a given URL, but this is not for the methodology itself. |
| Open Datasets | No | The paper uses synthetically generated data, described within the paper (e.g., "We generated 25 random Re LU neurons with unit norm input weights..."). However, it does not provide concrete access information (link, DOI, formal citation for public availability) for a pre-existing publicly available dataset. It states, "Our experiments use only synthetically-generated data, and the procedure by which the data is generated is described in each case, so there is no real-world data to share." |
| Dataset Splits | No | The paper does not specify dataset splits (e.g., percentages, counts) for training, validation, and testing. It mentions a "dataset {xi, yi}20 i=1" in the high-dimensional experiments but no explicit splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. It states, "Our experiments are computationally very simple to execute on any computer architecture and do not require GPUs, so we judge that these details are not important to include." |
| Software Dependencies | No | The paper mentions "All of our experiments were carried out in Py Torch" and "For solving (25) we utilized CVXPy Diamond and Boyd (2016)", but it does not specify version numbers for either of these software dependencies. |
| Experiment Setup | Yes | All of our experiments were carried out in Py Torch and used the Adam optimizer. We trained the models with mean squared error loss and included the representational cost PK k=1 vk 2 as a regularizer with λ = 1e 5 for the univariate experiments and λ = 1e 3 for the multi-variate experiments. All models were trained to convergence. The networks were initialized with 20 neurons for the univariate experiments and 800 neurons for the multi-variate experiments. |