Gradient-Based Feature Learning under Structured Data

Authors: Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A. Erdogdu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
Researcher Affiliation Academia 1University of Toronto and Vector Institute, 2New York University and Flatiron Institute, 3University of Tokyo and RIKEN AIP
Pseudocode Yes Algorithm 1 Layer-wise training of a two-layer Re LU network with gradient flow (GF).
Open Source Code No The paper does not provide any statements about releasing code or links to source code repositories.
Open Datasets No The paper discusses the theoretical concept of "n i.i.d. samples" from a single index model for its analysis of empirical dynamics, but it does not specify any named public or open dataset, nor does it provide access information (like a URL or citation) for any dataset.
Dataset Splits No The paper is theoretical and does not describe empirical experiments with datasets, thus it does not mention training, validation, or test splits.
Hardware Specification No The paper is theoretical and does not describe running experiments, therefore no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe running experiments, therefore no software dependencies with version numbers are mentioned.
Experiment Setup No The paper describes theoretical training procedures for gradient flow and neural networks but does not provide specific hyperparameter values, initialization details, or other concrete experimental setup settings that would be needed to reproduce an actual experiment.