reproducibilityindex.ai

Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data

Authors: Nikita Tsoy, Nikola Konstantinov

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we characterize simplicity bias for general datasets in the context of two-layer neural networks initialized with small weights and trained with gradient flow. Specifically, we prove that in the early training phases, network features cluster around a few directions that do not depend on the size of the hidden layer. Furthermore, for datasets with an XOR-like pattern, we precisely identify the learned features and demonstrate that simplicity bias intensifies during later training stages. These results indicate that features learned in the middle stages of training may be more useful for OOD transfer. We support this hypothesis with experiments on image data.
Researcher Affiliation	Academia	1INSAIT, Sofia University, Bulgaria.
Pseudocode	No	The paper contains mathematical proofs and descriptions of dynamics but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Replication files are available at https://github.com/nikita-tsoy98/simplicity-bias-beyond-linear-replication
Open Datasets	Yes	We test this hypothesis on the MNIST-CIFAR10 domino dataset proposed by Shah et al. (2020).
Dataset Splits	Yes	We further devoted 25% train and test data for validation, giving us four datasets: train-train, train-validation, test-train, and test-validation.
Hardware Specification	No	The paper mentions training a "Res Net-18 model" and that experiments ran, but it does not specify any particular hardware components like CPU models, GPU models (e.g., NVIDIA A100), or memory specifications.
Software Dependencies	No	The paper mentions using "PyTorch" and the "Transformers library" for the learning scheduler, but it does not specify version numbers for these software components, which is necessary for reproducibility.
Experiment Setup	Yes	batch size 128 lr 0.125 momentum 0.9 nesterov True weight decay 0.0005 Share of warm-up steps 12.5%