Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AutoDAL: Distributed Active Learning with Automatic Hyperparameter Selection
Authors: Xu Chen, Brett Wujek3537-3544
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed Auto DAL algorithm is applied to multiple benchmark datasets and a real-world electrocardiogram (ECG) dataset for classification. We demonstrate that the proposed Auto DAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art Auto ML approaches and active learning algorithms. |
| Researcher Affiliation | Industry | Xu Chen,1 Brett Wujek1 1SAS Inc EMAIL, EMAIL |
| Pseudocode | Yes | 1: procedure AUTOMATED DISTRIBUTED ACTIVE LEARNING ALGORITHM(Observations x, initial label matrix Y ) 2: while Ω L > 0 do Ω is the budget for the total number of labeled data, L is the number of data already labeled 3: Distribute the unlabeled data randomly and replicate the labeled data in different worker nodes. Solve (8) using the hybrid search strategy of GA with GSS: 4: Evaluate initial parent points P asynchronously in parallel. Populate reference cache-tree, R, with unique points from P. Associate each point p P with step p initialized to . 5: while (|R| nb) where nb is evaluation budget do 6: Select Λ P for local search based on the optimization problem formulated in (8). 7: for p P, search ζp = ζp {p+ p} {p p}; 8: if minχ ζp J(F, χ) < J(F, p) 2 p, then set p=χ pattern search success 9: else p= p/2 pattern search failure 10: end while 11: Conduct K-means clustering on F . Given F and h , solve (9) to determine ˆY with the top h selections. 12: Add the selected samples xj1, . . . , xjh and estimated labels ˆyj1, . . . , ˆyjh to labeled dataset and output the optimal selections including label probability distribution matrix F , hyperparameter set χ , updated label matrix ˆY and batch size h . 13: end while 14: end procedure |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the described methodology. |
| Open Datasets | Yes | Datasets: We evaluate the classification performance of the proposed method over five benchmark datasets taken from mldata.org (http : //mldata.org/repository/tags/data) including banana, breast cancer, diabetes, image and thyroid. Auto DAL is also applied to a real-world ECG heartbeat categorization dataset from Kaggle for classification (https : //www.kaggle.com/shayanfazeli/heartbeat). |
| Dataset Splits | Yes | For supervised methods, we performed a leave-one-dataset-out validation. |
| Hardware Specification | No | The distributed computing environment is comprised of 139 machines where each machine is running with 32 threads. |
| Software Dependencies | No | The paper mentions software tools like Auto-WEKA and Auto-sklearn but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The hyperparameters for the automated algorithm to estimate are initially provided as ranges where k = [1, 100], σ = [0.001, 10000], μ = [0, 10], and the batch size in active learning h = [1, 30]. The step sizes for searching on k, σ, μ, h are set to be 2, 5, 0.1, 2 respectively. |