reproducibilityindex.ai

Well-Classified Examples Are Underestimated in Classification with Deep Neural Networks

Authors: Guangxiang Zhao, Wenkai Yang, Xuancheng Ren, Lei Li, Yunfang Wu, Xu Sun9180-9189

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically support this claim by directly verifying the theoretical results or signiﬁcant performance improvement with our counterexample on diverse tasks, including image classiﬁcation, graph classiﬁcation, and machine translation. This section analyzes the practical effect of encouraging learning well-classiﬁed examples by applying the counterexample to various classiﬁcation tasks and settings.
Researcher Affiliation	Academia	1 Institute for Artiﬁcial Intelligence, Peking University 2 MOE Key Laboratory of Computational Linguistics, School of Computer Science, Peking University 3 Center for Data Science, Peking University 4 Beijing Academy of Artiﬁcial Intelligence
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/lancopku/well-classiﬁed-examples-areunderestimated.
Open Datasets	Yes	We adopt four tasks MNIST, CIFAR-10, CIFAR-100, and Image Net (Russakovsky et al. 2015), the descriptions of the dataset are in Appendix. For training, we borrow code from repositories with good reproduced accuracy and keep all their default settings unchanged. Speciﬁcally, we train the CNN model from Liu et al. (2016) on MNIST, train Res Net-50 (He et al. 2016) and Efﬁcient Net B0 (Tan and Le 2019) on CIFAR-10 and CIFAR-100 using the code by Narumiruna, train the Res Net-50 on Image Net with the example code from Py Torch, train the Efﬁcient Net B0 on Image Net using the code from timm (Wightman 2019). We choose Res Net-50 and Efﬁcient Net-B0 because they are canonical and So TA parameter-efﬁcient models, respectively. For evaluation, we report the best top-1 accuracy on the test set following the common practice. Graph classiﬁcation Typical applications of graph classiﬁcation are to binary classify the functionality of the graph-structured biological data. We do the experiments on PROTEINS (1113 graphs of protein structures) (Dobson and Doig 2003; Borgwardt et al. 2005) and NCI1 (4110 graphs of chemical compounds) (Wale, Watson, and Karypis 2008).
Dataset Splits	Yes	Especially, on graph datasets, each run contains 50 different train, valid, test splits of the data (proportion is 0.8, 0.1, 0.1, respectively) since a recent study indicates that different dataset splits largely affect the test performance (Shchur et al. 2019). For other tasks, we use their ofﬁcial data splits.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using PyTorch and fairseq, but does not provide specific version numbers for these or any other software dependencies, making reproducibility difficult without guessing versions.
Experiment Setup	No	The paper states it adopted default settings from external repositories and tuned hyperparameters, but it does not provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific system-level training settings in the main text, deferring details to the Appendix or external code.