Large Language Models to Enhance Bayesian Optimization

Authors: Tennison Liu, Nicolás Astorga, Nabeel Seedat, Mihaela van der Schaar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate LLAMBO s efficacy on the problem of hyperparameter tuning, highlighting strong empirical performance across a range of diverse benchmarks, proprietary, and synthetic tasks.
Researcher Affiliation Academia Tennison Liu , Nicol as Astorga , Nabeel Seedat & Mihaela van der Schaar DAMTP, University of Cambridge Cambirdge, UK {tl522,nja46}@cam.ac.uk
Pseudocode No The paper describes methods in text and uses figures to illustrate concepts, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes We provide the code to reproduce our results at https://github.com/tennisonliu/LLAMBO and the wider lab repository https://github.com/vanderschaarlab/LLAMBO.
Open Datasets Yes We conduct our investigations using 74 tasks extracted from Bayesmark and HPOBench [31, 32] and Open AI s GPT-3.5 Language Model (see Appendix D for detailed experimental procedures). ... We utilize Bayesmark [31] as a continuous HPT benchmark. ... We included HPOBench, specifically the tabular benchmarks for computationally efficient evaluations [32]. ... Additionally, we introduce 3 proprietary (SEER [72], MAGGIC [73], and CUTRACT [74]) and 3 synthetic datasets.
Dataset Splits No The paper references standard benchmarks (Bayesmark, HPOBench) which usually have predefined splits, but does not explicitly state the training/validation/test split percentages or methodology for their own experiments beyond mentioning testing predictions against 'unseen points' or using '5 initialization points'.
Hardware Specification Yes For context, all runtime measurements were conducted on an Intel i7-1260P (a consumer-grade laptop).
Software Dependencies Yes We conduct our investigations using 74 tasks extracted from Bayesmark and HPOBench [31, 32] and Open AI s GPT-3.5 Language Model (see Appendix D for detailed experimental procedures). ... For our experiments, we used gpt-3.5-turbo, version 0301 with default hyperparameters temperature = 0.7 and top p = 0.95. ... SKOpt (GP-based) [68]: ... Version 0.9.0. GP (Deep Kernel Learning) [48]: ... (Bo Torch version 0.8.5). SMAC3 [8]: ... Version 1.4.0. ... HEBO [77]: ... Version 0.3.5. Optuna [41]. ... Version 3.3.0.
Experiment Setup Yes Experimental setup. We conduct our investigations using 74 tasks extracted from Bayesmark and HPOBench [31, 32] and Open AI s GPT-3.5 Language Model (see Appendix D for detailed experimental procedures). ... Each search begins with 5 initialization points and proceeds for 25 trials, and we report average results over ten seeded searches. ... For our instantiation of LLAMBO, we sample M = 20 candidate points, and set the exploration hyperparameter to α = 0.1. ... For our experiments, we used gpt-3.5-turbo, version 0301 with default hyperparameters temperature = 0.7 and top p = 0.95.