reproducibilityindex.ai

Bandit Limited Discrepancy Search and Application to Machine Learning Pipeline Optimization

Authors: Akihiro Kishimoto, Djallel Bouneffouf, Radu Marinescu, Parikshit Ram, Ambrish Rawat, Martin Wistuba, Paulito Palmes, Adi Botea10228-10237

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on classification benchmarks show that BLDS is superior to competing algorithms. We also combine BLDS with hyperparameter optimization, empirically showing the advantage of BLDS.
Researcher Affiliation	Collaboration	1 IBM Research 2 Amazon Research 3 Eaton Akihiro.Kishimoto@ibm.com, Djallel.Bouneffouf@ibm.com, radu.marinescu@ie.ibm.com, parikshit.ram@ibm.com, Ambrish.Rawat@ie.ibm.com, marwistu@amazon.com, paulpalmes@ie.ibm.com, adi.botea@eaton.com
Pseudocode	Yes	Algorithm 1: LDS: Limited Discrepancy Search; Algorithm 2: BLDS: Bandit Limited Discrepancy Search
Open Source Code	No	The paper states 'We implemented all algorithms in Python with scikit-learn' but does not provide any link or explicit statement about releasing their own implementation code.
Open Datasets	Yes	We attempt to select the largest possible datasets, ending up with 19 datasets from the Open ML repositories whose data size ranges between 10,885 and 245,057 examples. The names and ids of the datasets are ELECTRICITY (151), ADULT (179), MOZILLA4 (1046), JM1 (1053), MAGIC TELESCOPE (1120), CLICK PREDICTION SMALL (1220), BANK MARKETING (1461), NOMAO (1486), SKIN SEGMENTATION (1502), AMAZON EMPLOYEE ACCESS (4135), HIGGS (23512), NUMERAI28.6 (23517), RUN OR WALK INFORMATION (40922), APSFAILURE (41138), MINIBOONE (41150), GUILLERMO (41159), RICCARDO (41161), DEFAULT OF CREDIT CARD CLIENTS (42477), and RELEVANT IMAGES DATASET (42680).
Dataset Splits	Yes	Generalization performance is estimated by splitting D into disjoint training and validation sets T and V , learning a function f by applying p to T and evaluating the predictive performance of function f on V . Therefore, the algorithm selection problem is written as p = arg min p P 1 k Pk i=1 L(p, T (i), V (i)), where L(p, T (i), V (i)) is the value of the black-box loss function (e.g., classification error) achieved by p when trained on T (i) and evaluated on V (i). We use k-fold cross-validation (Kohavi 1995), which splits the training data into k equal-sized partitions V (1), . . . , V (k), and sets T (i) = D \ V (i) for i = 1, . . . , k.
Hardware Specification	Yes	We implemented all algorithms in Python with scikit-learn (Pedregosa, Varoquaux, and Gramfort 2011) and performed the experiments on a cluster of Intel Xeon CPU E5-2667 processors at 3.3GHz.
Software Dependencies	Yes	We implemented all algorithms in Python with scikit-learn (Pedregosa, Varoquaux, and Gramfort 2011)
Experiment Setup	Yes	We set up experiments similar to those in (Liu et al. 2020) and use standard evaluation metrics in the community (Feurer et al. 2015b). For each benchmark dataset, we consider a 70-30% train-validation split, and run each algorithm with a time limit of two hours per trial to perform a binary classification. We consider (1.0 AUROC) (area under the ROC curve) as the black-box objective function that needs to be minimized. With initial experiments using different benchmarks, we set b = 100 and η = 2 for all the algorithms, and c L/δ = 1/9600 for BLDS, where L = 3072. Due to the UCB and LCB formulas, note that we do not need to optimize c and δ separately. The exploration parameter for CMAB is set to 0.3.