reproducibilityindex.ai

Top-Ambiguity Samples Matter: Understanding Why Deep Ensemble Works in Selective Classification

Authors: Qiang Ding, Yixuan Cao, Ping Luo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The assumptions and the theoretical results are supported by systematic experiments on both computer vision and natural language processing tasks.
Researcher Affiliation	Academia	1Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China 3Peng Cheng Laboratory, Shenzhen 518066, China
Pseudocode	Yes	Algorithm 1 A Lower Bound of Maximum φ0.
Open Source Code	No	The paper refers to the "ofﬁcial open-sourced implementation of SAT" for specific parts (backbone model and data preprocessing), but there is no statement or link indicating that the authors have released their own source code for the methodology presented in this paper.
Open Datasets	Yes	Following [9], we used CIFAR-10, CIFAR-100, and SVHN for image classiﬁcation tasks. Following [10], we used MRPC, MNLI, and QNLI for text classiﬁcation tasks.
Dataset Splits	Yes	The sizes of the training set, development set, and test set of each data set used in experiments are shown in Table 1.
Hardware Specification	No	The paper discusses models and training procedures but does not specify any hardware components (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies	No	The paper mentions using "Pretrained BERT-base is provided by the Huggingface Transformer Library [41]" and training procedures but does not provide specific version numbers for software dependencies such as the Huggingface library, PyTorch, or TensorFlow.
Experiment Setup	Yes	The model is optimized using SGD with an initial learning rate of 0.1 (the learning rate decays by half in every 25 epochs), the momentum of 0.9, weight decay of 0.0005, batch size of 128, and a total training epoch of 300.