Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SEAL: Semantic-Aware Hierarchical Learning for Generalized Category Discovery

Authors: Zhenqi He, Yuanpei Liu, Kai Han

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental SEAL consistently achieves state-of-the-art performance on finegrained benchmarks, including the SSB benchmark, Oxford-Pet, and the Herbarium19 dataset, and further demonstrates generalization on coarse-grained datasets. ... Through extensive experimentation on public GCD benchmarks, SEAL consistently demonstrates its effectiveness and achieves superior performance, especially on fine-grained datasets.
Researcher Affiliation Academia Zhenqi He* Yuanpei Liu* Kai Han Visual AI Lab, The University of Hong Kong EMAIL EMAIL
Pseudocode Yes Algorithm 1: Dynamic Update of Mh
Open Source Code No We will release the codes and guidelines for reproducing the results after acceptance.
Open Datasets Yes We conduct a comprehensive evaluation of our method across a variety of benchmarks. The main paper reports results on the Semantic Shift Benchmark (SSB) [58], which covers finegrained datasets-CUB [60], Stanford Cars [34], and FGVC-Aircraft [42]-plus Oxford-Pet [46] and the more challenging Herbarium19 [55]. ... For all datasets, we follow the class split protocol of [57]
Dataset Splits Yes For all datasets, we follow the class split protocol of [57], where a subset of classes is selected as the known ( Old ) label set Yl. From these known classes, 50% of the samples are used to construct the labelled set Dl, and the remaining images with instances from novel classes form the unlabelled set Du. ... For CIFAR-100, 80% of the classes are designated as Old classes, while the remaining 20% as New classes. ... the model's hyperparameters are chosen based on its performance on a hold-out validation set, formed by the original test splits of labelled classes in each dataset.
Hardware Specification Yes All experiments are performed on a single NVIDIA L40S GPU with 24GB of memory.
Software Dependencies No All experiments utilize the Py Torch framework on a workstation with Nvidia L40s GPUs.
Experiment Setup Yes The model is trained for 200 epochs using a batch size of 128 and a cosine learning rate schedule, starting from an initial learning rate of 10-1 and decaying to 10-4. ... We perform hyperparameter tuning using a held-out validation split from the labelled data. Specifically, we tune the consistency temperature τc and the soft negative controller λs based on their performance on the Stanford Cars [34] dataset. ... optimal performance achieved when τc = 0.75 and λs = 1.0.