Particle-based Variational Inference with Generalized Wasserstein Gradient Flow
Authors: Ziheng Cheng, Shiyue Zhang, Longlin Yu, Cheng Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we demonstrate the effectiveness and efficiency of the proposed framework on both simulated and real data problems. |
| Researcher Affiliation | Academia | Ziheng Cheng School of Mathematical Sciences Peking University alex-czh@stu.pku.edu.cn Shiyue Zhang School of Mathematical Sciences Peking University zhangshiyue@stu.pku.edu.cn Longlin Yu School of Mathematical Sciences Peking University llyu@pku.edu.cn Cheng Zhang School of Mathematical Sciences and Center for Statistical Science Peking University chengzhang@math.pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 GWG: Generalized Wasserstein Gradient Flow; Algorithm 2 Ada-GWG: Adaptive Generalized Wasserstein Gradient Flow |
| Open Source Code | Yes | The code is available at https://github.com/Alexczh1/GWG. |
| Open Datasets | Yes | We compare our algorithm with SGLD and SVGD variants on Bayesian neural networks (BNN)... The datasets are all randomly partitioned into 90% for training and 10% for testing... The datasets are all randomly partitioned into 90% for training and 10% for testing. The mini-batch size is 100 except for Concrete on which we use 400. The particle size is 100 and the results are averaged over 10 random trials. Table 1 shows the average test RMSE and NLL and their standard deviation. We see that Ada-GWG can achieve comparable or better results than the other methods. Figure 4 shows the test RMSE against iterations of different methods on the Boston dataset. |
| Dataset Splits | Yes | The datasets are all randomly partitioned into 90% for training and 10% for testing. Then, we further split the training dataset by 10% to create a validation set for hyperparameter selection as done in (Liu & Wang, 2016). |
| Hardware Specification | No | The authors mention 'computational resources provided by the High-performance Computing Platform of Peking University' but do not provide specific hardware details (e.g., CPU/GPU models, memory). |
| Software Dependencies | No | The paper mentions using 'SGD optimizer with Nesterov momentum' and 'Adam optimizer', which implies standard machine learning libraries, but does not specify software versions (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | For L2-GF, PFG and Ada-GWG, we parameterize fw as 3-layer neural networks with tanh activation function. Each hidden layer has 32 neurons. The inner loop iteration is 5 and we use SGD optimizer with Nesterov momentum (momentum 0.9) to train fw with learning rate η=1e-3. The particle step size is 0.1. |