Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Classification with Rejection Based on Cost-sensitive Classification
Authors: Nontawat Charoenphakdee, Zhenghang Cui, Yivan Zhang, Masashi Sugiyama
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the usefulness of our proposed approach in clean-labeled, noisy-labeled, and positive-unlabeled classification. In this section, we provide experimental results of classification with rejection. |
| Researcher Affiliation | Academia | Nontawat Charoenphakdee 1 2 Zhenghang Cui 1 2 Yivan Zhang 1 2 Masashi Sugiyama 2 1 1The University of Tokyo, Tokyo, Japan 2RIKEN AIP, Tokyo, Japan. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology or a direct link to a code repository. |
| Open Datasets | Yes | Datasets and models: For binary classification, we used the subjective-versus-objective classification (Subj), which is a text dataset (Pang & Lee, 2004). Moreover, we used Phishing and Spambase, which are tabular datasets, and Twonorm, which is a synthetic dataset drawn from different multivariate Gaussian distributions (Lichman et al., 2013). We also used the Gisette dataset, which is the problem of separating the highly confusible digits 4 and 9 with noisy features (Guyon et al., 2005). ... We also used the image datasets, which are MNIST (Le Cun, 1998), Kuzushiji-MNIST (KMNIST) (Clanuwat et al., 2018), and Fashion-MNIST (Xiao et al., 2017). |
| Dataset Splits | No | The paper mentions using 'additional training data' for hyperparameter tuning for some methods, but does not provide specific percentages, sample counts, or detailed splitting methodology for a validation set. |
| Hardware Specification | Yes | We would like to thank ... the Supercomputing Division, Information Technology Center, The University of Tokyo, for providing us the Reedbush supercomputer system to conduct the experiments. |
| Software Dependencies | No | The paper states 'The implementation was done using Py Torch (Paszke et al., 2019).' but does not provide a specific version number for PyTorch or other software dependencies. |
| Experiment Setup | Yes | The varying rejection costs ranged from {0.1, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40} for all settings. Both rejection threshold of ANGLE and the temperature parameter for SCE are chosen from the following candidate set of twenty numbers spaced evenly in a log scale from 0 to 1 (inclusively) and nine integers from 2 to 10. |