Well-Classified Examples Are Underestimated in Classification with Deep Neural Networks

Authors: Guangxiang Zhao, Wenkai Yang, Xuancheng Ren, Lei Li, Yunfang Wu, Xu Sun9180-9189

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically support this claim by directly verifying the theoretical results or significant performance improvement with our counterexample on diverse tasks, including image classification, graph classification, and machine translation. This section analyzes the practical effect of encouraging learning well-classified examples by applying the counterexample to various classification tasks and settings.
Researcher Affiliation Academia 1 Institute for Artificial Intelligence, Peking University 2 MOE Key Laboratory of Computational Linguistics, School of Computer Science, Peking University 3 Center for Data Science, Peking University 4 Beijing Academy of Artificial Intelligence
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/lancopku/well-classified-examples-areunderestimated.
Open Datasets Yes We adopt four tasks MNIST, CIFAR-10, CIFAR-100, and Image Net (Russakovsky et al. 2015), the descriptions of the dataset are in Appendix. For training, we borrow code from repositories with good reproduced accuracy and keep all their default settings unchanged. Specifically, we train the CNN model from Liu et al. (2016) on MNIST, train Res Net-50 (He et al. 2016) and Efficient Net B0 (Tan and Le 2019) on CIFAR-10 and CIFAR-100 using the code by Narumiruna, train the Res Net-50 on Image Net with the example code from Py Torch, train the Efficient Net B0 on Image Net using the code from timm (Wightman 2019). We choose Res Net-50 and Efficient Net-B0 because they are canonical and So TA parameter-efficient models, respectively. For evaluation, we report the best top-1 accuracy on the test set following the common practice. Graph classification Typical applications of graph classification are to binary classify the functionality of the graph-structured biological data. We do the experiments on PROTEINS (1113 graphs of protein structures) (Dobson and Doig 2003; Borgwardt et al. 2005) and NCI1 (4110 graphs of chemical compounds) (Wale, Watson, and Karypis 2008).
Dataset Splits Yes Especially, on graph datasets, each run contains 50 different train, valid, test splits of the data (proportion is 0.8, 0.1, 0.1, respectively) since a recent study indicates that different dataset splits largely affect the test performance (Shchur et al. 2019). For other tasks, we use their official data splits.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using PyTorch and fairseq, but does not provide specific version numbers for these or any other software dependencies, making reproducibility difficult without guessing versions.
Experiment Setup No The paper states it adopted default settings from external repositories and tuned hyperparameters, but it does not provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific system-level training settings in the main text, deferring details to the Appendix or external code.