Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Long-tailed Recognition with Model Rebalancing

Authors: JIAAN LUO, Feng Hong, Qiang Hu, Xiaofeng Cao, Feng Liu, Jiangchao Yao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on diverse long-tailed benchmarks, spanning multi-class and multi-label tasks, demonstrate that MORE significantly improves generalization, particularly for tail classes, and effectively complements existing imbalance mitigation methods. These results highlight MORE s potential as a robust plug-and-play module in long-tailed settings. The code is available here.
Researcher Affiliation	Academia	1Cooperative Medianet Innovation Center, Shanghai Jiao Tong University 2School of Computer Science and Technology, Tongji University 3School of Computing and Information Systems, The University of Melbourne 4Shanghai Artificial Intelligence Laboratory EMAIL EMAIL EMAIL
Pseudocode	Yes	The pseudo-code of our training process is shown in Appendix A.
Open Source Code	No	The code of this paper will be released after anonymized review.
Open Datasets	Yes	For single-label recognition, we adopt CIFAR-100-LT [Krizhevsky et al., 2009] and Places-LT [Liu et al., 2019]. For multi-label recognition, we conduct experiments on four diverse datasets: MIML [Zhou and Zhang, 2006], Pascal-VOC [Everingham et al., 2010], NUS-WIDE-SCENE [Chua et al., 2009], and MS-COCO [Lin et al., 2014].
Dataset Splits	Yes	We follow standard protocols in long-tailed classification by treating all classes equally during testing and reporting results across three splits: Many, Medium, and Few, based on the number of training samples per class.
Hardware Specification	Yes	Experiments based on CIFAR-100-LT and MIML are carried out on NVIDIA Ge Force RTX 3090 GPUs, while other experiments are carried out on NVIDIA A100 GPUs.
Software Dependencies	Yes	Our code is implemented with Pytorch 1.12.1.
Experiment Setup	Yes	We train each model with batch size of 64 (for Pascal-VOC) / 128 (for Image Net-LT) / 256 (for CIFAR-100-LT, MIML and NUS-WIDE-SCENE) / 512 (for Places-LT) / 1024 (for MS-COCO), SGD optimizer with momentum of 0.9, weight decay of 0.0002. For multi-label tasks, the initial learning rate is set to 3e-4, with cosine learning-rate scheduling along training. For tasks based on CLIP model, we use CLIP s Transformer-based pretrained text encoder to extract label features. During training, only vision encoder is fine-tuned, using a pre-trained Res Net34 model. Other settings are aligned with those of non-CLIP-based models.