reproducibilityindex.ai

Contextual Active Model Selection

Authors: Xuefeng Liu, Fangfang Xia, Rick Stevens, Yuxin Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate the effectiveness and robustness of our approach on a variety of online model selection tasks spanning different application domains (from generic ML benchmarks such as CIFAR10 to domain-specific tasks in biomedical analysis), data scales (ranging from 80 to 10K), data modalities (i.e., tabular, image, and graph-based data), and label types (binary or multiclass labels). For the tasks evaluated, (1) CAMS outperforms all competing baselines by a significant margin.
Researcher Affiliation	Academia	Xuefeng Liu1 , Fangfang Xia2, Rick L. Stevens1,2, Yuxin Chen1 1Department of Computer Science, University of Chicago 2Argonne National Laboratory
Pseudocode	Yes	Figure 1: The Contextual Active Model Selection (CAMS) algorithm
Open Source Code	Yes	We provide the code and data in the supplementary material with a readme.txt for reproducing the results. Experiment details are listed in Section 6 and Appendix G, D.6. (from NeurIPS Paper Checklist, Section 5)
Open Datasets	Yes	Datasets. We evaluate our approach using five datasets: (1) CIFAR10 [41]... (2) DRIFT [73]... (3) VERTEBRAL [5]... (4) HIV [74]... (5) Cov Type [24]...
Dataset Splits	No	The paper mentions training and test sets but does not specify explicit validation set splits (e.g., percentages or counts) or a distinct validation phase with defined splits for hyperparameter tuning in the main experimental setup. It mentions 'randomly selected stream-size aligned data from testing-set' for online streaming.
Hardware Specification	Yes	We performed our experiments on a Linux server with 80 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz and total 528 Gigabyte memory.
Software Dependencies	No	The paper mentions software like 'VGG', 'Res Net', 'Dense Net', 'scikit-learn built-in models', but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We set 100 realizations and 3000 stream-size for DRIFT, 20 realizations and 10000 stream-size for CIFAR10, 200 realizations and 4000 stream size for HIV, 300 realization and 80 stream-size for VERTEBRAL. In each realization, we randomly selected stream-size aligned data from testing-set and make it as online streaming data which is the input of each algorithm.