Cut your Losses with Squentropy

Authors: Like Hui, Mikhail Belkin, Stephen Wright

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy. We also demonstrate that it provides significantly better model calibration than either of these alternative losses and, furthermore, has less variance with respect to the random initialization.
Researcher Affiliation Academia 1Computer Science and Engineering, University of California, San Diego 2Halıcıo glu Data Science Institute, University of California, San Diego 3Wisconsin Institute for Discovery, UWMadison. Correspondence to: Like Hui <lhui@ucsd.edu>.
Pseudocode No The paper does not include any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present any structured code-like blocks for its methods.
Open Source Code No The paper does not provide any explicit statements about releasing source code for the squentropy loss function or a link to a code repository.
Open Datasets Yes NLP datasets include MRPC, SST-2, QNLI, QQP, text8, enwik8, text5, and text20. Speech datasets include TIMIT, WSJ, and Librispeech. MNIST, CIFAR-10, STL-10 (Coates et al., 2011), CIFAR-100, SVHN (Netzer et al., 2011), and Image Net are vision tasks.
Dataset Splits Yes CIFAR-100: (Krizhevsky et al., 2009) consists of 50, 000 32 32 pixel training images and 10, 000 32 32 pixel test images in 100 different classes. It is a balanced dataset with 6, 00 images of each class.
Hardware Specification No The paper mentions general hardware types in the Acknowledgements: 'We thank Nvidia for the donation of GPUs and Google for providing access to the cloud TPUs. This work uses CPU/GPU nodes (allocated with TG-CIS220009) provided by San Diego Supercomputer center...'. However, it does not specify exact models (e.g., NVIDIA A100, specific TPU versions) or detailed configurations needed for reproduction.
Software Dependencies No The paper lists various neural network architectures (e.g., fine-tuned BERT, Resnet-18, Transformer-XL) and mentions they follow hyperparameter settings from a previous work. However, it does not provide a list of specific software libraries or frameworks with version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.x', 'scikit-learn 0.24') that would be necessary for reproducibility.
Experiment Setup Yes We follow the hyperparameter settings given in Appendix B of (Hui & Belkin, 2020) for the cross-entropy loss and the square loss (other than SVHN, STL-10, and CIFAR-100), and use the algorithmic parameters settings of the cross entropy for squentropy in most cases. The exceptions are SVHN and STL-10, where squentropy and square loss have a smaller learning rate (0.1 for cross entropy while 0.02 for squentropy and square loss). More details about hyperparameter settings of SVHN, STL-10, CIFAR-100 are in Appendix B. Table 4. Hyper-parameters for CIFAR-100, SVHN, and STL-10. Wide-Res Net CIFAR-100 lr=0.1, layer=28 wide-factor=20, batch size: 128.