reproducibilityindex.ai

Large Language Models to Enhance Bayesian Optimization

Authors: Tennison Liu, Nicolás Astorga, Nabeel Seedat, Mihaela van der Schaar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate LLAMBO s efficacy on the problem of hyperparameter tuning, highlighting strong empirical performance across a range of diverse benchmarks, proprietary, and synthetic tasks.
Researcher Affiliation	Academia	Tennison Liu , Nicol as Astorga , Nabeel Seedat & Mihaela van der Schaar DAMTP, University of Cambridge Cambirdge, UK {tl522,nja46}@cam.ac.uk
Pseudocode	No	The paper describes methods in text and uses figures to illustrate concepts, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	We provide the code to reproduce our results at https://github.com/tennisonliu/LLAMBO and the wider lab repository https://github.com/vanderschaarlab/LLAMBO.
Open Datasets	Yes	We conduct our investigations using 74 tasks extracted from Bayesmark and HPOBench [31, 32] and Open AI s GPT-3.5 Language Model (see Appendix D for detailed experimental procedures). ... We utilize Bayesmark [31] as a continuous HPT benchmark. ... We included HPOBench, specifically the tabular benchmarks for computationally efficient evaluations [32]. ... Additionally, we introduce 3 proprietary (SEER [72], MAGGIC [73], and CUTRACT [74]) and 3 synthetic datasets.
Dataset Splits	No	The paper references standard benchmarks (Bayesmark, HPOBench) which usually have predefined splits, but does not explicitly state the training/validation/test split percentages or methodology for their own experiments beyond mentioning testing predictions against 'unseen points' or using '5 initialization points'.
Hardware Specification	Yes	For context, all runtime measurements were conducted on an Intel i7-1260P (a consumer-grade laptop).
Software Dependencies	Yes	We conduct our investigations using 74 tasks extracted from Bayesmark and HPOBench [31, 32] and Open AI s GPT-3.5 Language Model (see Appendix D for detailed experimental procedures). ... For our experiments, we used gpt-3.5-turbo, version 0301 with default hyperparameters temperature = 0.7 and top p = 0.95. ... SKOpt (GP-based) [68]: ... Version 0.9.0. GP (Deep Kernel Learning) [48]: ... (Bo Torch version 0.8.5). SMAC3 [8]: ... Version 1.4.0. ... HEBO [77]: ... Version 0.3.5. Optuna [41]. ... Version 3.3.0.
Experiment Setup	Yes	Experimental setup. We conduct our investigations using 74 tasks extracted from Bayesmark and HPOBench [31, 32] and Open AI s GPT-3.5 Language Model (see Appendix D for detailed experimental procedures). ... Each search begins with 5 initialization points and proceeds for 25 trials, and we report average results over ten seeded searches. ... For our instantiation of LLAMBO, we sample M = 20 candidate points, and set the exploration hyperparameter to α = 0.1. ... For our experiments, we used gpt-3.5-turbo, version 0301 with default hyperparameters temperature = 0.7 and top p = 0.95.