Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Are All Losses Created Equal: A Neural Collapse Perspective

Authors: Jinxin Zhou, Chong You, Xiao Li, Kangning Liu, Sheng Liu, Qing Qu, Zhihui Zhu

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments further show that NC features obtained from all relevant losses (i.e., CE, LS, FL, MSE) lead to largely identical performance on test data as well, provided that the network is sufficiently large and trained until convergence. ... We also provide an experimental verification of this claim through experiments in Section 4.1.
Researcher Affiliation Collaboration Jinxin Zhou Ohio State University EMAIL Chong You Google Research EMAIL Xiao Li University of Michigan EMAIL Kangning Liu New York University EMAIL Sheng Liu New York University EMAIL Qing Qu University of Michigan EMAIL Zhihui Zhu Ohio State University EMAIL
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The source code is available at https://github.com/jinxinzhou/nc_loss.
Open Datasets Yes We train a Wide Res Net50 network [45] on CIFAR10 and CIFAR100 datasets [46] and a Wide Res Net18 network on mini Image Net [47] with various widths and number of iterations for image classification using these four different losses.
Dataset Splits Yes the test accuracy is reported based on the model with best accuracy on validation set, where we organize the validation set by holding out 10 percent data from the training set.
Hardware Specification No The provided text does not explicitly describe the specific hardware used to run the experiments (e.g., GPU models, CPU types, or cloud instance details). Appendix A is referenced for this information but not provided in the text.
Software Dependencies No The provided text does not specify software dependencies with version numbers.
Experiment Setup Yes For optimization, we use SGD with momentum 0.9 and an initial learning rate 0.1 decayed by a factor of 0.1 at 3/7 of the total number of iterations. Following [28], the norm of gradient is clipped at 2 which can improve performance for all losses. For CIFAR10 and mini Image Net, the weight decay is set to 5e-4 for all configurations with all losses. For CIFAR100, the weight decay is fine-tuned to achieve best accuracy for every configuration and loss.