An Unconstrained Layer-Peeled Perspective on Neural Collapse
Authors: Wenlong Ji, Yiping Lu, Yiliang Zhang, Zhun Deng, Weijie J Su
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that our results also hold during the training of neural networks in real-world tasks when explicit regularization or weight decay is not used. (Abstract) 4 EMPIRICAL RESULTS To evaluate our theory, we trained the Res Net18 (He et al., 2016) on both MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009) datasets without weight decay, and tracked how the last layer features and classifiers converge to neural collapse solutions. The results are plotted in Figure 1. |
| Researcher Affiliation | Academia | Wenlong Ji Peking University Yiping Lu Stanford University Yiliang Zhang University of Pennsylvania Zhun Deng Harvard University Weijie J. Su University of Pennsylvania |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link for open-source code availability. |
| Open Datasets | Yes | trained the Res Net18 (He et al., 2016) on both MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009) datasets |
| Dataset Splits | No | The paper mentions using well-known datasets like MNIST and CIFAR-10, which have standard splits, but it does not explicitly state the specific train/validation/test split percentages or methodology used for their experiments within the paper. |
| Hardware Specification | No | All experiments were run in Python (version 3.6.9) on Google Colab. (Appendix D) |
| Software Dependencies | No | All experiments were run in Python (version 3.6.9) on Google Colab. (Appendix D) This only specifies the Python version and not other crucial libraries or frameworks (e.g., PyTorch, TensorFlow) with their versions. |
| Experiment Setup | Yes | In the real data experiments, we trained the VGG-13 (Simonyan & Zisserman, 2014) and Res Net18 (He et al., 2016) on MNIST (Le Cun et al., 1998), KMNIST (Clanuwat et al., 2018), Fashion MNIST (Xiao et al., 2017) and CIFAR-10 datasets (Krizhevsky et al., 2009) without weight decay, and with a learning rate of 0.01, momentum of 0.3, and batch size of 128. |