Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Cross-Domain Empirical Risk Minimization for Unbiased Long-Tailed Classification
Authors: Beier Zhu, Yulei Niu, Xian-Sheng Hua, Hanwang Zhang3589-3597
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on several long-tail classification benchmark datasets: CIFAR100-LT (Cui et al. 2019), Places365-LT (Liu et al. 2019), Image Net-LT (Liu et al. 2019), and i Naturalist 2018 (Van Horn et al. 2018). Experimental results show that x ERM outperforms previous stateof-the-arts on both long-tailed and balanced test sets, which demonstrates that the performance gain is not from catering to the tail. Further qualitative studies show that the x ERM helps with better feature representation. |
| Researcher Affiliation | Collaboration | Beier Zhu1, Yulei Niu1*, Xian-Sheng Hua2, Hanwang Zhang1 1Nanyang Technological University 2Damo Academy, Alibaba Group |
| Pseudocode | Yes | Figure 2: The Algorithm: x ERM |
| Open Source Code | No | The paper mentions that "The implementations of the two models are open" but does not provide a specific link or explicit statement about releasing the code for x ERM itself. |
| Open Datasets | Yes | We conducted experiments on four long-tailed classification datasets: CIFAR100-LT (Cui et al. 2019), Places365-LT (Liu et al. 2019), Image Net-LT (Liu et al. 2019), and i Naturalist 2018 (Van Horn et al. 2018). |
| Dataset Splits | Yes | We divided the test set of CIFAR100-LT-IB-100 and Image Net LT into three subsets according to the number of samples in each class: many-shot (categories with >100 images), medium-shot (categories with 20 100 images), and few-shot (categories with <20 images). ... We established the long-tailed test splits by downsampling the original well-balanced test set with various imbalanced ratios, which is the same as the training set construction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | We set the scaling parameter γ in Eq. (9) to 2 for CIFAR100-LT, 5 for Places365-LT and 1.5 for other datasets. ... All networks were trained for 200 epochs on CIFAR100-LT, 30 epochs on Places365-LT, and 90 epochs on Image Net-LT and i Naturalist18. |