AutoDAL: Distributed Active Learning with Automatic Hyperparameter Selection

Authors: Xu Chen, Brett Wujek3537-3544

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed Auto DAL algorithm is applied to multiple benchmark datasets and a real-world electrocardiogram (ECG) dataset for classification. We demonstrate that the proposed Auto DAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art Auto ML approaches and active learning algorithms.
Researcher Affiliation Industry Xu Chen,1 Brett Wujek1 1SAS Inc steven.xu.chen@gmail.com, Brett.Wujek@sas.com
Pseudocode Yes 1: procedure AUTOMATED DISTRIBUTED ACTIVE LEARNING ALGORITHM(Observations x, initial label matrix Y ) 2: while Ω L > 0 do Ω is the budget for the total number of labeled data, L is the number of data already labeled 3: Distribute the unlabeled data randomly and replicate the labeled data in different worker nodes. Solve (8) using the hybrid search strategy of GA with GSS: 4: Evaluate initial parent points P asynchronously in parallel. Populate reference cache-tree, R, with unique points from P. Associate each point p P with step p initialized to . 5: while (|R| nb) where nb is evaluation budget do 6: Select Λ P for local search based on the optimization problem formulated in (8). 7: for p P, search ζp = ζp {p+ p} {p p}; 8: if minχ ζp J(F, χ) < J(F, p) 2 p, then set p=χ pattern search success 9: else p= p/2 pattern search failure 10: end while 11: Conduct K-means clustering on F . Given F and h , solve (9) to determine ˆY with the top h selections. 12: Add the selected samples xj1, . . . , xjh and estimated labels ˆyj1, . . . , ˆyjh to labeled dataset and output the optimal selections including label probability distribution matrix F , hyperparameter set χ , updated label matrix ˆY and batch size h . 13: end while 14: end procedure
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes Datasets: We evaluate the classification performance of the proposed method over five benchmark datasets taken from mldata.org (http : //mldata.org/repository/tags/data) including banana, breast cancer, diabetes, image and thyroid. Auto DAL is also applied to a real-world ECG heartbeat categorization dataset from Kaggle for classification (https : //www.kaggle.com/shayanfazeli/heartbeat).
Dataset Splits Yes For supervised methods, we performed a leave-one-dataset-out validation.
Hardware Specification No The distributed computing environment is comprised of 139 machines where each machine is running with 32 threads.
Software Dependencies No The paper mentions software tools like Auto-WEKA and Auto-sklearn but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The hyperparameters for the automated algorithm to estimate are initially provided as ranges where k = [1, 100], σ = [0.001, 10000], μ = [0, 10], and the batch size in active learning h = [1, 30]. The step sizes for searching on k, σ, μ, h are set to be 2, 5, 0.1, 2 respectively.