Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unlocker: Disentangle the Deadlock of Learning between Label-noisy and Long-tailed Data

Authors: shu chen, HongJun Xu, Ruichi Zhang, Mengke Li, Yonggang Zhang, Yang Lu, Bo Han, Yiu-ming Cheung, Hanzi Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness of our method in alleviating model bias and handling long-tailed noisy label data.
Researcher Affiliation	Academia	1Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China, Xiamen University 2College of Computer Science and Software Engineering, Shenzhen University 3Hong Kong University of Science and Technology 4Hong Kong Baptist University EMAIL, EMAIL, EMAIL
Pseudocode	Yes	A.1 Detailed Training Process The following is the process of our proposed method Unlocker, which leverages a bilevel optimization framework. Within each epoch, the inner optimization employs NLL methods with LA to train model, while the outer loop optimizes the learnable parameter τ to dynamically scale the strength of LA. This process adaptively tunes τ to integrate NLL methods and LA, iteratively disentangling the NLL-LTL deadlock and enhancing model robustness against long-tailed noisy label data. Algorithm 1 Detailed training process of Unlocker
Open Source Code	Yes	Code is available at https://github.com/Chen Shu248/Unlocker.
Open Datasets	Yes	Datasets. We conduct simulated experiments on CIFAR-10/100[26], including three cases: consistent, relieve, and aggravate which we summarize in Figure 1a . CIFAR-10/100 contains 50,000 training images and 10,000 test images of size 32 32 pixels, where CIFAR-10 contains 10 classes and CIFAR-100 contains 100 classes. ... We also evaluate the performance of our method on real-world datasets, including Red Mini Image Net [27], Clothing1M [28] and Web Vision-50 [29].
Dataset Splits	Yes	CIFAR-10/100 contains 50,000 training images and 10,000 test images of size 32 32 pixels, where CIFAR-10 contains 10 classes and CIFAR-100 contains 100 classes. ... Clothing1M contains 1 million training images obtained from online shopping websites, with 50k, 14k, and 10k images split into clean labels for training, validation, and testing across 14 classes.
Hardware Specification	Yes	All experiments are executed on a Ge Force RTX 3090 GPU using the Py Torch 1.8.0 framework to maintain hardware consistency.
Software Dependencies	Yes	All experiments are executed on a Ge Force RTX 3090 GPU using the Py Torch 1.8.0 framework to maintain hardware consistency.
Experiment Setup	Yes	Implementation Details. To ensure a fair comparison with existing methods, we keep the training configurations consistent with the baseline NLL methods. Specifically, we employ the 18-layer Res Net as the backbone architecture. The mini-batch size is fixed at 256. All models are optimized using SGD with a momentum of 0.9. A random seed of 123 is used across all experiments to ensure reproducibility. For NLL methods combined with LA in baselines, the τ is set to 1.0. For our proposed method Unlocker, the learnable parameter τ is initialized to 1.0 and optimized using SGD with a momentum of 0.9. The initial learning rate for τ is set to 0.1, which is adjusted to 0.01 at the 150-th epoch. The β in EMA to update adjustments is set to 0.9.