Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cross Entropy versus Label Smoothing: A Neural Collapse Perspective

Authors: Li Guo, George Andriopoulos, Zifan Zhao, Zixuan Dong, Shuyang Ling, Keith W. Ross

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study begins with a comprehensive empirical comparison between cross-entropy loss with one-hot labels (hereafter referred to as cross-entropy loss for simplicity) and label smoothing throughout the training process. Specifically, we carefully study how the last layer features and linear classifiers evolve during training. Our findings are as follows: 1. Compared with cross-entropy loss, models trained with label smoothing exhibit accelerated convergence in terms of training error and neural collapse metrics. Furthermore, they converge to a more pronounced level of NC1 and NC2. ... Experiment Setup. We conducted experiments on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100, STL-10 (Coates et al., 2011), and Tiny Image Net (Deng et al., 2009).
Researcher Affiliation	Academia	Li Guo EMAIL New York University Shanghai George Andriopoulos EMAIL New York University Abu Dhabi Zifan Zhao EMAIL New York University Shanghai Zixuan Dong EMAIL New York University Shanghai Shuyang Ling EMAIL New York University Shanghai Keith Ross EMAIL New York University Abu Dhabi
Pseudocode	No	The paper describes mathematical derivations and experimental procedures but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include a link to a code repository.
Open Datasets	Yes	Experiment Setup. We conducted experiments on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100, STL-10 (Coates et al., 2011), and Tiny Image Net (Deng et al., 2009).
Dataset Splits	No	The paper mentions using CIFAR-10, CIFAR-100, STL-10, and Tiny Image Net datasets and a training period of 800 epochs for some and 300 for others, but it does not explicitly provide details on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or citations to predefined splits).
Hardware Specification	Yes	All experiments were conducted on a single RTX 3090 GPU with 24GB of memory.
Software Dependencies	No	The paper mentions using stochastic gradient descent (SGD) but does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	To comprehensively analyze model behavior during TPT, we extend the training period to 800 epochs for CIFAR-10, CIFAR-100, and STL-10, and 300 epochs for Tiny Image Net. For all datasets, we use a batch size of 128 and train with stochastic gradient descent (SGD) with a momentum of 0.9. The learning rate is initialized at 0.05 and follows a multi-step decay, decreasing by a factor of 0.1 at epochs 100 and 200 for Tiny Image Net and at epochs 150 and 350 for the other datasets. We use a default weight decay of 5 10 4, except for the experiments in Section 3.3, where a weight decay value of 1 10 4 is used.