reproducibilityindex.ai

Gradient-Based Feature Learning under Structured Data

Authors: Alireza Mousavi-Hosseini, Denny Wu, Taiji Suzuki, Murat A. Erdogdu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
Researcher Affiliation	Academia	1University of Toronto and Vector Institute, 2New York University and Flatiron Institute, 3University of Tokyo and RIKEN AIP
Pseudocode	Yes	Algorithm 1 Layer-wise training of a two-layer Re LU network with gradient flow (GF).
Open Source Code	No	The paper does not provide any statements about releasing code or links to source code repositories.
Open Datasets	No	The paper discusses the theoretical concept of "n i.i.d. samples" from a single index model for its analysis of empirical dynamics, but it does not specify any named public or open dataset, nor does it provide access information (like a URL or citation) for any dataset.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments with datasets, thus it does not mention training, validation, or test splits.
Hardware Specification	No	The paper is theoretical and does not describe running experiments, therefore no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe running experiments, therefore no software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper describes theoretical training procedures for gradient flow and neural networks but does not provide specific hyperparameter values, initialization details, or other concrete experimental setup settings that would be needed to reproduce an actual experiment.