Particle-based Variational Inference with Generalized Wasserstein Gradient Flow

Authors: Ziheng Cheng, Shiyue Zhang, Longlin Yu, Cheng Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems.
Researcher Affiliation Academia Ziheng Cheng School of Mathematical Sciences Peking University alex-czh@stu.pku.edu.cn Shiyue Zhang School of Mathematical Sciences Peking University zhangshiyue@stu.pku.edu.cn Longlin Yu School of Mathematical Sciences Peking University llyu@pku.edu.cn Cheng Zhang School of Mathematical Sciences and Center for Statistical Science Peking University chengzhang@math.pku.edu.cn
Pseudocode Yes Algorithm 1 GWG: Generalized Wasserstein Gradient Flow; Algorithm 2 Ada-GWG: Adaptive Generalized Wasserstein Gradient Flow
Open Source Code Yes The code is available at https://github.com/Alexczh1/GWG.
Open Datasets Yes We compare our algorithm with SGLD and SVGD variants on Bayesian neural networks (BNN)... The datasets are all randomly partitioned into 90% for training and 10% for testing... The datasets are all randomly partitioned into 90% for training and 10% for testing. The mini-batch size is 100 except for Concrete on which we use 400. The particle size is 100 and the results are averaged over 10 random trials. Table 1 shows the average test RMSE and NLL and their standard deviation. We see that Ada-GWG can achieve comparable or better results than the other methods. Figure 4 shows the test RMSE against iterations of different methods on the Boston dataset.
Dataset Splits Yes The datasets are all randomly partitioned into 90% for training and 10% for testing. Then, we further split the training dataset by 10% to create a validation set for hyperparameter selection as done in (Liu & Wang, 2016).
Hardware Specification No The authors mention 'computational resources provided by the High-performance Computing Platform of Peking University' but do not provide specific hardware details (e.g., CPU/GPU models, memory).
Software Dependencies No The paper mentions using 'SGD optimizer with Nesterov momentum' and 'Adam optimizer', which implies standard machine learning libraries, but does not specify software versions (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes For L2-GF, PFG and Ada-GWG, we parameterize fw as 3-layer neural networks with tanh activation function. Each hidden layer has 32 neurons. The inner loop iteration is 5 and we use SGD optimizer with Nesterov momentum (momentum 0.9) to train fw with learning rate η=1e-3. The particle step size is 0.1.