Fast Second Order Stochastic Backpropagation for Variational Inference
Authors: Kai Fan, Ziteng Wang, Jeff Beck, James Kwok, Katherine A. Heller
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our method on several real-world datasets and provide comparisons with other stochastic gradient methods to show substantial enhancement in convergence rates. |
| Researcher Affiliation | Academia | Kai Fan Duke University kai.fan@stat.duke.edu Ziteng Wang HKUST wangzt2012@gmail.com Jeffrey Beck Duke University jeff.beck@duke.edu James T. Kwok HKUST jamesk@cse.ust.hk Katherine Heller Duke University kheller@gmail.com |
| Pseudocode | Yes | Algorithm 1 Hessian-free Algorithm on Stochastic Gaussian Variational Inference (HFSGVI) |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | Yes | We apply our algorithm to this variational logistic regression on three appropriate datasets: Duke Breast and Leukemia are small size but high-dimensional for sparse logistic regression, and a9a which is large. ... The datasets we used are images from the Frey Face, Olivetti Face and MNIST. |
| Dataset Splits | No | The paper mentions that hyperparameters were tuned and cross-validation was used, but it does not provide specific details on validation dataset splits (percentages or counts) that would be needed for reproduction, beyond total train/test counts for some datasets. |
| Hardware Specification | No | The paper mentions 'GPU' generally but does not specify any particular GPU model, CPU model, or detailed computer specifications used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers (e.g., Python 3.8, PyTorch 1.9) needed to replicate the experiment. |
| Experiment Setup | Yes | The experimental setting is as follows. The initial weights are randomly drawn from N(0, 0.012I) or N(0, 0.0012I), while all bias terms are initialized as 0. The variational lower bound only introduces the regularization on the encoder parameters, so we add an L2 regularizer on decoder parameters with a shrinkage parameter 0.001 or 0.0001. The number of hidden nodes for encoder and decoder is the same for all auto-encoder model, which is reasonable and convenient to construct a symmetric structure. The number is always tuned from 200 to 800 with 100 increment. The mini-batch size is 100 for L-BFGS and Ada, while larger mini-batch is recommended for HF, meaning it should vary according to the training size. |