AutoDAL: Distributed Active Learning with Automatic Hyperparameter Selection
Authors: Xu Chen, Brett Wujek3537-3544
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed Auto DAL algorithm is applied to multiple benchmark datasets and a real-world electrocardiogram (ECG) dataset for classification. We demonstrate that the proposed Auto DAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art Auto ML approaches and active learning algorithms. |
| Researcher Affiliation | Industry | Xu Chen,1 Brett Wujek1 1SAS Inc steven.xu.chen@gmail.com, Brett.Wujek@sas.com |
| Pseudocode | Yes | 1: procedure AUTOMATED DISTRIBUTED ACTIVE LEARNING ALGORITHM(Observations x, initial label matrix Y ) 2: while Ω L > 0 do Ω is the budget for the total number of labeled data, L is the number of data already labeled 3: Distribute the unlabeled data randomly and replicate the labeled data in different worker nodes. Solve (8) using the hybrid search strategy of GA with GSS: 4: Evaluate initial parent points P asynchronously in parallel. Populate reference cache-tree, R, with unique points from P. Associate each point p P with step p initialized to . 5: while (|R| nb) where nb is evaluation budget do 6: Select Λ P for local search based on the optimization problem formulated in (8). 7: for p P, search ζp = ζp {p+ p} {p p}; 8: if minχ ζp J(F, χ) < J(F, p) 2 p, then set p=χ pattern search success 9: else p= p/2 pattern search failure 10: end while 11: Conduct K-means clustering on F . Given F and h , solve (9) to determine ˆY with the top h selections. 12: Add the selected samples xj1, . . . , xjh and estimated labels ˆyj1, . . . , ˆyjh to labeled dataset and output the optimal selections including label probability distribution matrix F , hyperparameter set χ , updated label matrix ˆY and batch size h . 13: end while 14: end procedure |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the described methodology. |
| Open Datasets | Yes | Datasets: We evaluate the classification performance of the proposed method over five benchmark datasets taken from mldata.org (http : //mldata.org/repository/tags/data) including banana, breast cancer, diabetes, image and thyroid. Auto DAL is also applied to a real-world ECG heartbeat categorization dataset from Kaggle for classification (https : //www.kaggle.com/shayanfazeli/heartbeat). |
| Dataset Splits | Yes | For supervised methods, we performed a leave-one-dataset-out validation. |
| Hardware Specification | No | The distributed computing environment is comprised of 139 machines where each machine is running with 32 threads. |
| Software Dependencies | No | The paper mentions software tools like Auto-WEKA and Auto-sklearn but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The hyperparameters for the automated algorithm to estimate are initially provided as ranges where k = [1, 100], σ = [0.001, 10000], μ = [0, 10], and the batch size in active learning h = [1, 30]. The step sizes for searching on k, σ, μ, h are set to be 2, 5, 0.1, 2 respectively. |