Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Novel Class Discovery for Long-tailed Recognition
Authors: Chuyu Zhang, Ruijie Xu, Xuming He
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments on CIFAR100, Image Net100, Herbarium19 and large-scale i Naturalist18 datasets, and the results demonstrate the superiority of our method. Our code is available at https://github.com/kleinzcy/NCDLR. |
| Researcher Affiliation | Academia | Chuyu Zhang EMAIL Shanghai Tech University, Shanghai, China Lingang Laboratory, Shanghai, China Ruijie Xu EMAIL Shanghai Tech University, Shanghai, China Xuming He EMAIL Shanghai Tech University, Shanghai, China Shanghai Engineering Research Center of Intelligent Vision and Imaging, Shanghai, China |
| Pseudocode | Yes | Algorithm 1: Sinkhorn-Knopp Based Pseudo Labeling Algorithm. Algorithm 2: Adaptive Self-labeling Algorithm. Algorithm 3: The algorithm of estimating the number of novel categories. |
| Open Source Code | Yes | Our code is available at https://github.com/kleinzcy/NCDLR. |
| Open Datasets | Yes | We conduct extensive experiments on two constructed long-tailed datasets, CIFAR100 and Imagenet100, as well as two challenging natural long-tailed datasets, Herbarium19 and i Naturalist18. ... CIFAR100 (Krizhevsky et al., 2009) and Image Net100 (Deng et al., 2009), and two real-world medium/large-scale long-tailed image classification datasets, Herbarium19 (Tan et al., 2019) and i Naturalist18 (Van Horn et al., 2018). |
| Dataset Splits | Yes | For each dataset, we randomly divide all classes into 50% known classes and 50% novel classes. For testing, we report the NCD performance on the official validation sets of each dataset, except for CIFAR100, where we use its official test set. The details of our datasets are shown in Tab. 1. Test set 10k 10k 5.0k 5.0k 2.8k 3.0k 6.0k |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU models, or memory) were provided in the paper. The paper mentions using a ViT-B-16 backbone network but not the hardware it ran on. |
| Software Dependencies | No | The paper mentions using 'AdamW with momentum as the optimizer' and the 'Sinkhorn-Knopp algorithm' but does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We train 50 epochs on CIFAR100 and Image Net100, 70 epochs on Herbarium and i Naturalist18. We use AdamW with momentum as the optimizer with linear warm-up and cosine annealing (lrbase = 1e-3, lrmin = 1e-4, and weight decay 5e-4). We set α = 1, and select γ = 500 by validation. For all the experiments, we set the batch size to 128 and the iteration step L to 10. For the Sinkhorn-Knopp algorithm, we adopt all the hyperparameters from (Caron et al., 2020), e.g. niter = 3 and ϵ = 0.05. |