Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DAA: Amplifying Unknown Discrepancy for Test-Time Discovery

Authors: Tianle Liu, Fan Lyu, Chenggong Ni, Zhang, Fuyuan Hu, Liang Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our method maintains high adaptability and stability, and significantly improves novel class discovery performance. Our code is available at https://github.com/Le Tian L-TT/DAA-for-TTD.
Researcher Affiliation	Academia	1School of Electronics and Information Engineering, Suzhou University of Science and Technology 2Suzhou Key Laboratory of Embodied Intelligent Agents for Cooperative Perception and Advanced Control 3New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 4Jiangsu Industrial Intelligent and Low-carbon Technology Engineering Center 5Suzhou Key Laboratory of Intelligent and Low-carbon Technology Application
Pseudocode	Yes	Algorithm 1 DAA Training and Test-Time Discovery with STMR
Open Source Code	Yes	Our code is available at https://github.com/Le Tian L-TT/DAA-for-TTD.
Open Datasets	Yes	We conduct our experiments based on three benchmark datasets, namely CIFAR100 (C100)[14], Caltech-UCSD Birds-200-2011 (CUB)[35] and Tiny Image Net[15].
Dataset Splits	Yes	All these datasets are split into known and unknown classes (7:3). The model is trained on the known training set, and tested on the mixture of known and unknown test sets. We follow three transformed datasets used for discovery in TTD work: CIFAR100D, CUB-200D, and Tiny-Image Net D. The dataset partitioning follows the scheme outlined in Table 1.
Hardware Specification	No	The paper does not explicitly specify the hardware used (e.g., GPU model, CPU type, RAM) for its experiments. While the NeurIPS checklist indicates that sufficient information was provided, these details are absent from the main text and appendix of the paper.
Software Dependencies	No	In our implementation, we build our method on the prompt-based method L2P [38], which employs a Vi T-B/16 backbone [13] following the pretraining procedure of NCD and GMP work. We employed the contrastive loss of the GCD literature when we fine-tune the pretrained model on the known classes, using SGD optimizer and cosine decay learning rate scheduler with an initial learning rate of 0.1 and minimum learning rate of 0.0001, and weight decay of 0.00005. All input images are resized to 224 224 and augmented to match the pretrained backbone settings. No specific software versions for libraries (e.g., PyTorch, TensorFlow) are mentioned.
Experiment Setup	Yes	In our implementation, we build our method on the prompt-based method L2P [38], which employs a Vi T-B/16 backbone [13] following the pretraining procedure of NCD and GMP work. We employed the contrastive loss of the GCD literature when we fine-tune the pretrained model on the known classes, using SGD optimizer and cosine decay learning rate scheduler with an initial learning rate of 0.1 and minimum learning rate of 0.0001, and weight decay of 0.00005. All input images are resized to 224 224 and augmented to match the pretrained backbone settings.