Gradient-Based Feature Learning under Structured Data
Authors: Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A. Erdogdu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods. |
| Researcher Affiliation | Academia | 1University of Toronto and Vector Institute, 2New York University and Flatiron Institute, 3University of Tokyo and RIKEN AIP |
| Pseudocode | Yes | Algorithm 1 Layer-wise training of a two-layer Re LU network with gradient flow (GF). |
| Open Source Code | No | The paper does not provide any statements about releasing code or links to source code repositories. |
| Open Datasets | No | The paper discusses the theoretical concept of "n i.i.d. samples" from a single index model for its analysis of empirical dynamics, but it does not specify any named public or open dataset, nor does it provide access information (like a URL or citation) for any dataset. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments with datasets, thus it does not mention training, validation, or test splits. |
| Hardware Specification | No | The paper is theoretical and does not describe running experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe running experiments, therefore no software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper describes theoretical training procedures for gradient flow and neural networks but does not provide specific hyperparameter values, initialization details, or other concrete experimental setup settings that would be needed to reproduce an actual experiment. |