Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Only When It Matters: Cost-Aware Long-Tailed Classification

Authors: Yu-Cheng He, Yao-Xiang Ding, Han-Jia Ye, Zhi-Hua Zhou

AAAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct theoretical analysis to show that under the assumption that the feature-space distance and the misclassiﬁcation cost are correlated, the identiﬁcation of high-cost tail instances can be realized by building region partitions with a low variance of risk within each region. The resulting Aug ARP approach could signiﬁcantly outperform baseline approaches on both benchmark datasets and real-world product sales datasets. We verify the effectiveness of Aug ARP under benchmark datasets, showing its effectiveness over existing cost-agnostic baselines. We further verify its potential usefulness in real-world applications on the Amazon Products Sales 2023 dataset.
Researcher Affiliation	Academia	1National Key Laboratory for Novel Software Technology, Nanjing University, China 2State Key Laboratory of CAD & CG, Zhejiang University, China 3School of Artiﬁcial Intelligence, Nanjing University, China EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Aug ARP; Algorithm 2 AUGMENT
Open Source Code	Yes	The code is released on https://www.lamda.nju.edu.cn/code/AugARP.ashx
Open Datasets	Yes	Four datasets are used in our experiments: CIFAR10, CIFAR100, i Naturalist, and 2023 Amazon Sales Dataset1. We introduce brieﬂy each dataset and experiment setting in their respective subsections2. 1https://www.kaggle.com/datasets/lokeshparab/amazonproducts-dataset
Dataset Splits	No	The paper specifies training and test set sizes (e.g., "50000 images in the training set, and 10000 images in the test set" for CIFAR10/100) but does not explicitly state a validation dataset split or how it's used.
Hardware Specification	No	The paper mentions using Resnet-32 and Resnet-50 models but does not specify any hardware details like GPU models, CPU types, or memory.
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software components, such as libraries or frameworks.
Experiment Setup	Yes	For each algorithm, we train 200 epochs and repeat 3 times to report the average weighted accuracy. The imbalance ratio is set as 100 for both the long-tail setting and the step setting. On CIFAR100, 10 head classes and 40 tail classes are set as important classes with misclassiﬁcation costs of 100 while others are with misclassiﬁcation costs of 1.