An Unconstrained Layer-Peeled Perspective on Neural Collapse

Authors: Wenlong Ji, Yiping Lu, Yiliang Zhang, Zhun Deng, Weijie J Su

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used. (Abstract) 4 EMPIRICAL RESULTS To evaluate our theory, we trained the Res Net18 (He et al., 2016) on both MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009) datasets without weight decay, and tracked how the last layer features and classifiers converge to neural collapse solutions. The results are plotted in Figure 1.
Researcher Affiliation Academia Wenlong Ji Peking University Yiping Lu Stanford University Yiliang Zhang University of Pennsylvania Zhun Deng Harvard University Weijie J. Su University of Pennsylvania
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link for open-source code availability.
Open Datasets Yes trained the Res Net18 (He et al., 2016) on both MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009) datasets
Dataset Splits No The paper mentions using well-known datasets like MNIST and CIFAR-10, which have standard splits, but it does not explicitly state the specific train/validation/test split percentages or methodology used for their experiments within the paper.
Hardware Specification No All experiments were run in Python (version 3.6.9) on Google Colab. (Appendix D)
Software Dependencies No All experiments were run in Python (version 3.6.9) on Google Colab. (Appendix D) This only specifies the Python version and not other crucial libraries or frameworks (e.g., PyTorch, TensorFlow) with their versions.
Experiment Setup Yes In the real data experiments, we trained the VGG-13 (Simonyan & Zisserman, 2014) and Res Net18 (He et al., 2016) on MNIST (Le Cun et al., 1998), KMNIST (Clanuwat et al., 2018), Fashion MNIST (Xiao et al., 2017) and CIFAR-10 datasets (Krizhevsky et al., 2009) without weight decay, and with a learning rate of 0.01, momentum of 0.3, and batch size of 128.