Contextual Active Model Selection
Authors: Xuefeng Liu, Fangfang Xia, Rick Stevens, Yuxin Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate the effectiveness and robustness of our approach on a variety of online model selection tasks spanning different application domains (from generic ML benchmarks such as CIFAR10 to domain-specific tasks in biomedical analysis), data scales (ranging from 80 to 10K), data modalities (i.e., tabular, image, and graph-based data), and label types (binary or multiclass labels). For the tasks evaluated, (1) CAMS outperforms all competing baselines by a significant margin. |
| Researcher Affiliation | Academia | Xuefeng Liu1 , Fangfang Xia2, Rick L. Stevens1,2, Yuxin Chen1 1Department of Computer Science, University of Chicago 2Argonne National Laboratory |
| Pseudocode | Yes | Figure 1: The Contextual Active Model Selection (CAMS) algorithm |
| Open Source Code | Yes | We provide the code and data in the supplementary material with a readme.txt for reproducing the results. Experiment details are listed in Section 6 and Appendix G, D.6. (from NeurIPS Paper Checklist, Section 5) |
| Open Datasets | Yes | Datasets. We evaluate our approach using five datasets: (1) CIFAR10 [41]... (2) DRIFT [73]... (3) VERTEBRAL [5]... (4) HIV [74]... (5) Cov Type [24]... |
| Dataset Splits | No | The paper mentions training and test sets but does not specify explicit validation set splits (e.g., percentages or counts) or a distinct validation phase with defined splits for hyperparameter tuning in the main experimental setup. It mentions 'randomly selected stream-size aligned data from testing-set' for online streaming. |
| Hardware Specification | Yes | We performed our experiments on a Linux server with 80 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz and total 528 Gigabyte memory. |
| Software Dependencies | No | The paper mentions software like 'VGG', 'Res Net', 'Dense Net', 'scikit-learn built-in models', but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We set 100 realizations and 3000 stream-size for DRIFT, 20 realizations and 10000 stream-size for CIFAR10, 200 realizations and 4000 stream size for HIV, 300 realization and 80 stream-size for VERTEBRAL. In each realization, we randomly selected stream-size aligned data from testing-set and make it as online streaming data which is the input of each algorithm. |