Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data

Authors: Nikita Tsoy, Nikola Konstantinov

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we characterize simplicity bias for general datasets in the context of two-layer neural networks initialized with small weights and trained with gradient flow. Specifically, we prove that in the early training phases, network features cluster around a few directions that do not depend on the size of the hidden layer. Furthermore, for datasets with an XOR-like pattern, we precisely identify the learned features and demonstrate that simplicity bias intensifies during later training stages. These results indicate that features learned in the middle stages of training may be more useful for OOD transfer. We support this hypothesis with experiments on image data.
Researcher Affiliation Academia 1INSAIT, Sofia University, Bulgaria.
Pseudocode No The paper contains mathematical proofs and descriptions of dynamics but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Replication files are available at https://github.com/nikita-tsoy98/simplicity-bias-beyond-linear-replication
Open Datasets Yes We test this hypothesis on the MNIST-CIFAR10 domino dataset proposed by Shah et al. (2020).
Dataset Splits Yes We further devoted 25% train and test data for validation, giving us four datasets: train-train, train-validation, test-train, and test-validation.
Hardware Specification No The paper mentions training a "Res Net-18 model" and that experiments ran, but it does not specify any particular hardware components like CPU models, GPU models (e.g., NVIDIA A100), or memory specifications.
Software Dependencies No The paper mentions using "PyTorch" and the "Transformers library" for the learning scheduler, but it does not specify version numbers for these software components, which is necessary for reproducibility.
Experiment Setup Yes batch size 128 lr 0.125 momentum 0.9 nesterov True weight decay 0.0005 Share of warm-up steps 12.5%