Composite Active Learning: Towards Multi-Domain Active Learning with Theoretical Guarantees
Authors: Guang-Yuan Hao, Hengguan Huang, Haotian Wang, Jie Gao, Hao Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate that our approach significantly outperforms the state-of-the-art AL methods on both synthetic and real-world multi-domain datasets. |
| Researcher Affiliation | Collaboration | 1Hong Kong University of Science and Technology 2Rutgers University 3National University of Singapore 4Mohamed bin Zayed University of Artificial Intelligence 5JD Logistics |
| Pseudocode | No | The paper does not contain any explicit pseudocode blocks or sections labeled 'Algorithm'. |
| Open Source Code | Yes | Code is available at https://github.com/Wang-MLLab/multi-domain-active-learning. |
| Open Datasets | Yes | We use three real-world datasets: Office-Home (65 classes) (Venkateswara et al. 2017), Image CLEF (12 classes) (National Bureau of Statistics 2014), and Office-Caltech (10 classes) (Fernando et al. 2014). |
| Dataset Splits | No | The paper mentions splitting data into 'training and test sets' for the real-world datasets, and provides specific training and test set sizes for Rotating MNIST. However, it does not explicitly mention or provide details for a separate 'validation' dataset split for any experiment. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU specifications, or cloud computing instance types). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We run a model in R = 5 rounds plus an initial round, with three different random seeds, and report the average results over three seeds (see Sec. 1 of the Supplement for more details). In each round, we allocate a labeling budget of 200, 20, and 20, respectively for the three datasets. |