Inversion-based Latent Bayesian Optimization

Authors: Jaewon Chu, Jinyoung Park, Seunghun Lee, Hyunwoo J. Kim

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the effectiveness of Inv BO on nine real-world benchmarks, such as molecule design and arithmetic expression fitting tasks.
Researcher Affiliation Academia Jaewon Chu , Jinyoung Park , Seunghun Lee, Hyunwoo J. Kim Computer Science & Engineering Korea University {allonsy07, lpmn678, llsshh319, hyunwoojkim}@korea.ac.kr
Pseudocode Yes Algorithm 1 Inversion Input: Encoder qϕ, decoder pθ, target data x, max iteration T, distance function d X , learning rate η, reconstruction loss L
Open Source Code Yes Code is available at https://github.com/mlvlab/Inv BO.
Open Datasets Yes We measure the performance of the proposed method named Inv BO on nine different tasks with three Bayesian optimization benchmarks: Guacamol [43], DRD3, and arithmetic expression fitting tasks [6, 8, 11 13, 44]. For Guacamol benchmarks, we use seven challenging tasks, Median molecules 2 (med2), Zaleplon MPO (zale), Perindopril MPO (pdop), Amlodipine MPO (adip), Osimertinib MPO (osmb), Ranolazine MPO (rano), and Valsartan SMARTS (valt).
Dataset Splits No The paper does not explicitly provide details about specific training, validation, and test splits (e.g., percentages or sample counts) for the datasets used beyond mentioning initial data points and max oracle calls.
Hardware Specification Yes We conducted experiments under the same condition for fair comparison: a single NVIDIA RTX 2080 TI with the CPU of AMD EPYC 7742.
Software Dependencies Yes We use Py Torch3, Bo Torch4 [48], GPy Torch5 [52], and Guacamol6 software packages.
Experiment Setup Yes The learning rate used in the inversion method is 0.1 in all tasks, as we empirically observe that it always finds the latent vector that generates the target discrete data. The maximum iteration number of the inversion method is 1,000 in all tasks. The rest of the hyperparameters follow the setting used in [14]. Since arithmetic fitting tasks and Guacamol with the small budget tasks use different initial data numbers used in [14], we set the number of top-k data and the number of query points Nq same as DRD3 task, as they use the same number of initial data points.