Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Low-Rank Simplicity Bias in Deep Networks

Authors: Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, Phillip Isola

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we make a series of empirical observations that investigate and extend the hypothesis that deeper networks are inductively biased to find solutions with lower effective rank embeddings. We show empirically that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well.
Researcher Affiliation Collaboration Minyoung Huh EMAIL MIT CSAIL Hossein Mobahi EMAIL Google Research Richard Zhang EMAIL Adobe Research Brian Cheung EMAIL MIT CSAIL & BCS Pulkit Agrawal EMAIL MIT CSAIL Phillip Isola EMAIL MIT CSAIL
Pseudocode No The paper describes methods and transformations, particularly in Appendix C 'Expanding a non-linear network', but does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The training details for Image Net can be found in https://github.com/pytorch/examples/blob/master/imagenet. This link refers to third-party standard examples, not the authors' specific code for the methodology described in this paper. No explicit statement of code release by the authors is found.
Open Datasets Yes We leverage our observations to demonstrate linear over-parameterization by depth" can be used to achieve better generalization performance on CIFAR (Krizhevsky et al., 2009) and Image Net (Russakovsky et al., 2015) without increasing modeling capacity. ... The kernel is constructed from the MNIST dataset.
Dataset Splits Yes We scale up our experiments to Image Net, a large-scale dataset consisting of 1.3 million images with 1000 classes, and show that our findings hold in practical settings. For these experiments, we use standardized architectures: Alex Net (Krizhevsky et al., 2012) which consists of 8-layers, and Res Net10 / Res Net18 (He et al., 2016) which consists of 10 and 18 layers, respectively. ... For all experiments rank(W ) = {1, 4, 16, 32, 64}, we use total of 128 training samples.
Hardware Specification Yes All models for image classification are trained using Py Torch (Paszke et al., 2019) with RTX 2080Ti GPUs.
Software Dependencies Yes All models for image classification are trained using Py Torch (Paszke et al., 2019) with RTX 2080Ti GPUs.
Experiment Setup Yes We train the model using SGD with a momentum of 0.9, and we do not use weight decay. ... For each model we trained using the learning rates [1.0, 0.5, 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, 0.002, 0.001] ... All models are trained for 24000 epochs ... For all models, we step the learning rate by a factor of 10 at epoch 18000. ... For SGD, we used a mini-batch size of 32. ... For data augmentation, we apply a random horizontal flip and random-resized crop.