Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rethinking Classifier Re-Training in Long-Tailed Recognition: Label Over-Smooth Can Balance

Authors: Siyu Sun, Han Lu, Jiangtong Li, Yichen Xie, Tianjiao Li, Xiaokang Yang, Liqing Zhang, Junchi Yan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, Image Net-LT, and i Naturalist2018. ... We conduct experiments on long-tailed image classification benchmarks: CIFAR100-LT, Image Net-LT and i Naturalist2018.
Researcher Affiliation	Collaboration	1Department of CSE & Mo E Key Lab of AI, Shanghai Jiao Tong University 2Bilibili Inc 3Tongji University 4UC Berkeley
Pseudocode	No	The paper describes methods textually and mathematically, but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code: https://github.com/Thinklab-SJTU/LOS
Open Datasets	Yes	We conduct experiments on long-tailed image classification benchmarks: CIFAR100-LT (Krizhevsky et al., 2009), Image Net-LT (Liu et al., 2019) and i Naturalist2018 (Van Horn et al., 2018).
Dataset Splits	Yes	1) CIFAR100-LT: The original balanced CIFAR100 (Krizhevsky et al., 2009) consists of 50,000 training images and 10,000 test images of size 32 32 with 100 classes. ... We evaluate the models on corresponding balanced test dataset and report top-1 accuracy. In line with the protocol in literature (Liu et al., 2019), we give accuracy on three different splits of classes with varying numbers of training data: many (over 100 images), medium (20 100 images), and few (less than 20 images).
Hardware Specification	Yes	All experiments are implemented using Py Torch and run on Ge Force RTX 3090 (24GB) and A100-PCIE (40GB).
Software Dependencies	No	The paper mentions 'Py Torch' as the implementation framework but does not specify its version number or any other software dependencies with versions.
Experiment Setup	Yes	For each model, we use SGD optimizer with momentum 0.9 and cosine learning rate scheduler... On CIFAR100-LT, We train each model for 200 epochs, with batch size 64 and initial learning rate 0.01. On Image Net-LT / i Naturalist2018, we train for 200 epochs, with batch size as 128 / 256 and initial learning rate 0.03 / 0.1. ... For smooth factors, we use 0.98 in CIFAR100-LT and 0.99 in Image Net LT and i Naturalist2018. The classifier finetune epoch is 20 with adequate learning ratio and weight.