reproducibilityindex.ai

Are All Losses Created Equal: A Neural Collapse Perspective

Authors: Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, Zhihui Zhu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments further show that NC features obtained from all relevant losses (i.e., CE, LS, FL, MSE) lead to largely identical performance on test data as well, provided that the network is sufﬁciently large and trained until convergence. ... We also provide an experimental veriﬁcation of this claim through experiments in Section 4.1.
Researcher Affiliation	Collaboration	Jinxin Zhou Ohio State University zhou.3820@osu.edu Chong You Google Research cyou@google.com Xiao Li University of Michigan xlxiao@umich.edu Kangning Liu New York University kl3141@nyu.edu Sheng Liu New York University shengliu@nyu.edu Qing Qu University of Michigan qingqu@umich.edu Zhihui Zhu Ohio State University zhu.3440@osu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code is available at https://github.com/jinxinzhou/nc_loss.
Open Datasets	Yes	We train a Wide Res Net50 network [45] on CIFAR10 and CIFAR100 datasets [46] and a Wide Res Net18 network on mini Image Net [47] with various widths and number of iterations for image classiﬁcation using these four different losses.
Dataset Splits	Yes	the test accuracy is reported based on the model with best accuracy on validation set, where we organize the validation set by holding out 10 percent data from the training set.
Hardware Specification	No	The provided text does not explicitly describe the specific hardware used to run the experiments (e.g., GPU models, CPU types, or cloud instance details). Appendix A is referenced for this information but not provided in the text.
Software Dependencies	No	The provided text does not specify software dependencies with version numbers.
Experiment Setup	Yes	For optimization, we use SGD with momentum 0.9 and an initial learning rate 0.1 decayed by a factor of 0.1 at 3/7 of the total number of iterations. Following [28], the norm of gradient is clipped at 2 which can improve performance for all losses. For CIFAR10 and mini Image Net, the weight decay is set to 5e-4 for all conﬁgurations with all losses. For CIFAR100, the weight decay is ﬁne-tuned to achieve best accuracy for every conﬁguration and loss.