Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Supervised Knowledge May Hurt Novel Class Discovery Performance

Authors: ZIYUN LI, Jona Otholt, Ben Dai, Di Hu, Christoph Meinel, Haojin Yang

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To show the validity of the proposed metric, we build up a large-scale benchmark with various degrees of semantic similarities between labeled/unlabeled datasets on Image Net by leveraging its hierarchical class structure. The results based on the proposed benchmark show that the proposed transfer flow is in line with the hierarchical class structure; and that NCD performance is consistent with the semantic similarities (measured by the proposed metric). Next, by using the proposed transfer flow, we conduct various empirical experiments with different levels of semantic similarity, yielding that supervised knowledge may hurt NCD performance.
Researcher Affiliation	Academia	Ziyun Li EMAIL Hasso Plattner Institute University of Potsdam Jona Otholt EMAIL Hasso Plattner Institute University of Potsdam Ben Dai EMAIL Department of Statistics Chinese University of Hong Kong Di Hu EMAIL Gaoling School of Artificial Intelligence Renmin University of China Christoph Meinel EMAIL Hasso Plattner Institute University of Potsdam Haojin Yang EMAIL Hasso Plattner Institute University of Potsdam
Pseudocode	No	The paper provides mathematical definitions and equations (e.g., for Transfer Flow and Pseudo Transfer Flow) but does not include any distinct pseudocode or algorithm blocks to describe a procedure.
Open Source Code	Yes	Code is released at https://github.com/J-L-O/SK-Hurt-NCD
Open Datasets	Yes	Specifically, the new benchmark is constructed based on a large-scale dataset, Image Net (Deng et al., 2009), by leveraging its hierarchical semantic information. ...Additionally, we also provide four data settings on CIFAR100, two high-similarity settings and two low-similarity settings, by leveraging the hierarchical class structure of CIFAR100 similarly. ... our proposed benchmark is based on the ENTITY-30 task (Santurkar et al., 2020), which contains 240 Image Net classes in total
Dataset Splits	Yes	Specifically, our proposed benchmark is based on the ENTITY-30 task (Santurkar et al., 2020), which contains 240 Image Net classes in total, with 30 superclasses and 8 subclasses for each superclass. ... As a consequence, we define three labeled sets L1, L1.5, L2 and two unlabeled sets U1, U2. The sets L1 and U1 are selected from the first 15 superclasses, where 6 subclasses of each superclass are assigned to L1, and the other 2 are assigned to U1. ...Additionally, we also provide four data settings on CIFAR100, two high-similarity settings and two low-similarity settings, by leveraging the hierarchical class structure of CIFAR100 similarly. Each case has 40 labeled classes and 10 unlabeled classes.
Hardware Specification	Yes	All experiments are conducted using Py Torch and run on NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number for it or any other key software libraries or frameworks used in the experiments.
Experiment Setup	Yes	For the first step, batch size is set to 512 for both datasets. We use an SGD optimizer with momentum 0.9, and weight decay 1e-4. The learning rate is governed by a cosine annealing learning rate schedule with a base learning rate of 0.1, a linear warmup of 10 epochs, and a minimum learning rate of 0.001. We pretrain the backbone for 200/100 epochs for CIFAR-100/Image Net. ... The pretraining is done using the small batch size configuration of the method, which uses a batch size of 256 and a queue size of 3840. The training is run for 800 epochs, with the queue being enabled at 60 epochs for our Image Net-based benchmark and 100 epochs for CIFAR100. ... In the second step of UNO, we train the methods for 500 epochs on CIFAR100 and 100 epochs for each setting on our benchmark.