Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Decoupled Entropy Minimization

Authors: Jing Ma, Hanlin Li, Xiang Xiang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments2 across various tasks, including semi-supervised and unsupervised learning, domain adaptation, and reinforcement learning, demonstrate Ada DEM s superior performance. Additional evaluations on noisy/imbalanced benchmarks and dynamic/non-stationary environments further validate the effectiveness of Ada DEM. 4 Experiments
Researcher Affiliation	Academia	Jing Ma2 , Hanlin Li2 , Xiang Xiang1,2 1 School of Computer Science and Tech, Huazhong University of Science and Tech, China 2 School of AI and Automation, Huazhong University of Science and Technology, China Equal contribution, co-first author; Correspondence to EMAIL
Pseudocode	Yes	A.1 Pseudo Code Algorithm 1 Pseudo code of DEM and Ada DEM in the Py Torch-like style
Open Source Code	Yes	Source code is available at https://github.com/HAIV-Lab/DEM
Open Datasets	Yes	For TTA, we adopt Image Net-C [40] containing 15 types of image corruptions, as well as Image Net [38] and its variants: -A [41], -V2. [42], -R. [43], and -S. [44] that represent natural distribution shifts. For SSL, we consider CIFAR-10, CIFAR-100 [45], STL-10 [46], Euro Sat [47], Tissue MNIST [48], and Semi-Aves [49]. Regarding synthetic-to-real UDA, we mainly experiment with the semantic segmentation task, using GTA5 dataset [50] as the source domain and Cityscapes dataset [51] as the target domain. We employ Minigrid [52], a series of discrete environments, to conduct RL experiments.
Dataset Splits	Yes	We set different numbers of available labeled samples, i.e., Nl per class, for the training of DNNs. Specifically, we set Nl = 4/25/400 for CIFAR-10, Nl = 2/4/25 for CIFAR-100, Nl = 4/10 for STL-10, Nl = 2/4 for Euro Sat, Nl = 10/50 for Tissue MNIST, and Nl = 15-53 for Semi-Aves. ... Image Net-C contains 15 different versions of corruptions applied to 50,000 images from the validation set of Image Net-1K [38].
Hardware Specification	Yes	We primarily run experiments on one NVIDIA Ge Force RTX 4090 GPU with 24 GB of memory.
Software Dependencies	No	In Py Torch, the "detach()" method can be used to obtain pi. ... We employ the RL Baselines3 Zoo5, a training framework for Stable Baselines3 RL agents, to run our experiments and strictly follow the configurations of RL algorithms implemented therein. ... Through TPE [39], a fast hyperparameter search algorithm integrated in NNI6, we can search for the optimal hyperparameter configuration (τ , α ).
Experiment Setup	Yes	DEM* employs a fast TPE algorithm [39] to search for the best hyperparameters (τ , α ). In Proposition A.1, we theoretically prove that valid values of τ in DEM satisfy 0 < τ ≤ 2/α where α > 0. ... we construct a search space where τ and α range from 0.0 to 2.0 (including 0.0 and 2.0), sampled at intervals of 0.1. ... Setting different learning rates for the optimizer leads to different learning efficiencies of the model on unlabeled samples.