reproducibilityindex.ai

Dual Active Learning for Both Model and Data Selection

Authors: Ying-Peng Tang, Sheng-Jun Huang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are conducted on 12 open ML datasets. The results demonstrate the proposed method can effectively learn a superior model with less labeled examples.
Researcher Affiliation	Academia	College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics Collaborative Innovation Center of Novel Software Technology and Industrialization MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China {tangyp, huangsj}@nuaa.edu.cn
Pseudocode	Yes	Algorithm 1 The DUAL Algorithm
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code or a link to a code repository.
Open Datasets	Yes	We conduct our experiments on 12 open ML [Vanschoren et al., 2013] multiclass classiﬁcation datasets.
Dataset Splits	Yes	Ltr, Lval split(L) Split labeled data into training set Ltr and validation set Lval.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions using 'auto-sklearn' and 'scikit-learn' but does not specify their version numbers or other software dependencies with specific versions.
Experiment Setup	Yes	For each case, we randomly sample 40% data as test set, 5% data as initially labeled set, and the rest is treated as the unlabeled pool. The data split is repeated for 10 times with different random seeds. For the initial candidate algorithms in CASH module, we employ 12 commonly used models: Adaboost, Random Forest, Libsvm SVC, SGD, Extra Trees, Decision Tree, K Nearest Neighbors, Passive Aggressive, Gradient Boosting, LDA, QDA, Multi Layer Perceptron. We set the truncated threshold as τ = m, where m is the number of validation data. For the tradeoff β, we set the parameter empirically as β = 1/g