Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SEAL: Semantic-Aware Hierarchical Learning for Generalized Category Discovery
Authors: Zhenqi He, Yuanpei Liu, Kai Han
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | SEAL consistently achieves state-of-the-art performance on finegrained benchmarks, including the SSB benchmark, Oxford-Pet, and the Herbarium19 dataset, and further demonstrates generalization on coarse-grained datasets. ... Through extensive experimentation on public GCD benchmarks, SEAL consistently demonstrates its effectiveness and achieves superior performance, especially on fine-grained datasets. |
| Researcher Affiliation | Academia | Zhenqi He* Yuanpei Liu* Kai Han Visual AI Lab, The University of Hong Kong EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1: Dynamic Update of Mh |
| Open Source Code | No | We will release the codes and guidelines for reproducing the results after acceptance. |
| Open Datasets | Yes | We conduct a comprehensive evaluation of our method across a variety of benchmarks. The main paper reports results on the Semantic Shift Benchmark (SSB) [58], which covers finegrained datasets-CUB [60], Stanford Cars [34], and FGVC-Aircraft [42]-plus Oxford-Pet [46] and the more challenging Herbarium19 [55]. ... For all datasets, we follow the class split protocol of [57] |
| Dataset Splits | Yes | For all datasets, we follow the class split protocol of [57], where a subset of classes is selected as the known ( Old ) label set Yl. From these known classes, 50% of the samples are used to construct the labelled set Dl, and the remaining images with instances from novel classes form the unlabelled set Du. ... For CIFAR-100, 80% of the classes are designated as Old classes, while the remaining 20% as New classes. ... the model's hyperparameters are chosen based on its performance on a hold-out validation set, formed by the original test splits of labelled classes in each dataset. |
| Hardware Specification | Yes | All experiments are performed on a single NVIDIA L40S GPU with 24GB of memory. |
| Software Dependencies | No | All experiments utilize the Py Torch framework on a workstation with Nvidia L40s GPUs. |
| Experiment Setup | Yes | The model is trained for 200 epochs using a batch size of 128 and a cosine learning rate schedule, starting from an initial learning rate of 10-1 and decaying to 10-4. ... We perform hyperparameter tuning using a held-out validation split from the labelled data. Specifically, we tune the consistency temperature τc and the soft negative controller λs based on their performance on the Stanford Cars [34] dataset. ... optimal performance achieved when τc = 0.75 and λs = 1.0. |