PG-LBO: Enhancing High-Dimensional Bayesian Optimization with Pseudo-Label and Gaussian Process Guidance

Authors: Taicai Chen, Yue Duan, Dong Li, Lei Qi, Yinghuan Shi, Yang Gao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The extensive experiments demonstrate that our proposed method outperforms existing VAE-BO algorithms in various optimization scenarios. Our code will be published at https://github.com/Taicai Chen/PG-LBO. Experiments In this section, we apply PG-LBO to three high-dimensional structured optimization tasks and compare it with several VAE-BO algorithms. Results: As shown in Figure 2, PG-LBO consistently outperforms all baselines by the end of optimization. Table 1 shows more details. Ablation Studies
Researcher Affiliation Collaboration Taicai Chen1, Yue Duan1, Dong Li2, Lei Qi3, Yinghuan Shi1*, Yang Gao1 1National Key Laboratory for Novel Software Technology, Nanjing University, China 2Huawei Noah s Ark Lab, China 3School of Computer Science and Engineering, Southeast University, China
Pseudocode Yes Algorithm 1: Pseudo code of PG-LBO
Open Source Code Yes Our code will be published at https://github.com/Taicai Chen/PG-LBO.
Open Datasets Yes Topology shape fitting task: The task involves using 10000 topology images from the dataset (Sosnovik and Oseledets 2019) and a VAE with the latent space dimension of 20. Expression reconstruction task: The objective function is a distance metric f(x) = max{ 7, R 10 10 log(1 + (x(v) x (v))2dv}. Task access to 40,000 data points and use the grammar VAE from (Kusner, Paige, and Hern andez-Lobato 2017) with the latent space dimension of 25. Chemical design task: The task uses the ZINC250K dataset (Sterling and Irwin 2015) to synthesize chemical molecules with the objective of maximizing the penalized water-octanol distribution coefficient (Plog P) of molecules.
Dataset Splits No The paper describes the use of labeled and unlabeled data for training and retraining the VAE, but it does not specify explicit train/validation/test splits (e.g., percentages or counts) for the datasets used in the experiments.
Hardware Specification No The paper does not provide any specific details regarding the hardware used to run the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup Yes Experimental setup: PG-LBO builds upon the foundation of LBO and uses the same VAE updating strategy and data weighting scheme during the BO process. In the training of pseudo-label data, the size of the pseudo-label dataset is maintained at half of the labeled dataset size, i.e., NP = NL/2. As the BO iterations progress, the accuracy of pseudo-labels improves. Therefore, we linearly increase the weight of the pseudo-label loss during VAE retraining rounds. For topology shape fitting task, λP = Linear Increase(0.5, 0.75). For expression reconstruction task and chemical design task, λP = Linear Increase(0.1, 0.75). Regarding the GP guidance loss weight, we consider the varying difficulty levels of different tasks, and accordingly, the loss weight varies. For topology shape fitting task and chemical design task, the weight λG = 1, while for expression reconstruction task, λG = 0.1. The momentum decay of the pseudo-label selection threshold, λ = 0.9. The data sampling method employs noisy sampling, with Gaussian noise N(0, 0.1).