reproducibilityindex.ai

AutoDAL: Distributed Active Learning with Automatic Hyperparameter Selection

Authors: Xu Chen, Brett Wujek3537-3544

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed Auto DAL algorithm is applied to multiple benchmark datasets and a real-world electrocardiogram (ECG) dataset for classiﬁcation. We demonstrate that the proposed Auto DAL algorithm is capable of achieving signiﬁcantly better performance compared to several state-of-the-art Auto ML approaches and active learning algorithms.
Researcher Affiliation	Industry	Xu Chen,1 Brett Wujek1 1SAS Inc steven.xu.chen@gmail.com, Brett.Wujek@sas.com
Pseudocode	Yes	1: procedure AUTOMATED DISTRIBUTED ACTIVE LEARNING ALGORITHM(Observations x, initial label matrix Y ) 2: while Ω L > 0 do Ω is the budget for the total number of labeled data, L is the number of data already labeled 3: Distribute the unlabeled data randomly and replicate the labeled data in different worker nodes. Solve (8) using the hybrid search strategy of GA with GSS: 4: Evaluate initial parent points P asynchronously in parallel. Populate reference cache-tree, R, with unique points from P. Associate each point p P with step p initialized to . 5: while (\|R\| nb) where nb is evaluation budget do 6: Select Λ P for local search based on the optimization problem formulated in (8). 7: for p P, search ζp = ζp {p+ p} {p p}; 8: if minχ ζp J(F, χ) < J(F, p) 2 p, then set p=χ pattern search success 9: else p= p/2 pattern search failure 10: end while 11: Conduct K-means clustering on F . Given F and h , solve (9) to determine ˆY with the top h selections. 12: Add the selected samples xj1, . . . , xjh and estimated labels ˆyj1, . . . , ˆyjh to labeled dataset and output the optimal selections including label probability distribution matrix F , hyperparameter set χ , updated label matrix ˆY and batch size h . 13: end while 14: end procedure
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	Datasets: We evaluate the classiﬁcation performance of the proposed method over ﬁve benchmark datasets taken from mldata.org (http : //mldata.org/repository/tags/data) including banana, breast cancer, diabetes, image and thyroid. Auto DAL is also applied to a real-world ECG heartbeat categorization dataset from Kaggle for classiﬁcation (https : //www.kaggle.com/shayanfazeli/heartbeat).
Dataset Splits	Yes	For supervised methods, we performed a leave-one-dataset-out validation.
Hardware Specification	No	The distributed computing environment is comprised of 139 machines where each machine is running with 32 threads.
Software Dependencies	No	The paper mentions software tools like Auto-WEKA and Auto-sklearn but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	The hyperparameters for the automated algorithm to estimate are initially provided as ranges where k = [1, 100], σ = [0.001, 10000], μ = [0, 10], and the batch size in active learning h = [1, 30]. The step sizes for searching on k, σ, μ, h are set to be 2, 5, 0.1, 2 respectively.