Large Language Models to Enhance Bayesian Optimization
Authors: Tennison Liu, Nicolás Astorga, Nabeel Seedat, Mihaela van der Schaar
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate LLAMBO s efficacy on the problem of hyperparameter tuning, highlighting strong empirical performance across a range of diverse benchmarks, proprietary, and synthetic tasks. |
| Researcher Affiliation | Academia | Tennison Liu , Nicol as Astorga , Nabeel Seedat & Mihaela van der Schaar DAMTP, University of Cambridge Cambirdge, UK {tl522,nja46}@cam.ac.uk |
| Pseudocode | No | The paper describes methods in text and uses figures to illustrate concepts, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the code to reproduce our results at https://github.com/tennisonliu/LLAMBO and the wider lab repository https://github.com/vanderschaarlab/LLAMBO. |
| Open Datasets | Yes | We conduct our investigations using 74 tasks extracted from Bayesmark and HPOBench [31, 32] and Open AI s GPT-3.5 Language Model (see Appendix D for detailed experimental procedures). ... We utilize Bayesmark [31] as a continuous HPT benchmark. ... We included HPOBench, specifically the tabular benchmarks for computationally efficient evaluations [32]. ... Additionally, we introduce 3 proprietary (SEER [72], MAGGIC [73], and CUTRACT [74]) and 3 synthetic datasets. |
| Dataset Splits | No | The paper references standard benchmarks (Bayesmark, HPOBench) which usually have predefined splits, but does not explicitly state the training/validation/test split percentages or methodology for their own experiments beyond mentioning testing predictions against 'unseen points' or using '5 initialization points'. |
| Hardware Specification | Yes | For context, all runtime measurements were conducted on an Intel i7-1260P (a consumer-grade laptop). |
| Software Dependencies | Yes | We conduct our investigations using 74 tasks extracted from Bayesmark and HPOBench [31, 32] and Open AI s GPT-3.5 Language Model (see Appendix D for detailed experimental procedures). ... For our experiments, we used gpt-3.5-turbo, version 0301 with default hyperparameters temperature = 0.7 and top p = 0.95. ... SKOpt (GP-based) [68]: ... Version 0.9.0. GP (Deep Kernel Learning) [48]: ... (Bo Torch version 0.8.5). SMAC3 [8]: ... Version 1.4.0. ... HEBO [77]: ... Version 0.3.5. Optuna [41]. ... Version 3.3.0. |
| Experiment Setup | Yes | Experimental setup. We conduct our investigations using 74 tasks extracted from Bayesmark and HPOBench [31, 32] and Open AI s GPT-3.5 Language Model (see Appendix D for detailed experimental procedures). ... Each search begins with 5 initialization points and proceeds for 25 trials, and we report average results over ten seeded searches. ... For our instantiation of LLAMBO, we sample M = 20 candidate points, and set the exploration hyperparameter to α = 0.1. ... For our experiments, we used gpt-3.5-turbo, version 0301 with default hyperparameters temperature = 0.7 and top p = 0.95. |