Doubly Stochastic Variational Bayes for non-Conjugate Inference

Authors: Michalis Titsias, Miguel Lázaro-Gredilla

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression. In this section, we apply the DSVI algorithm to different types of non-conjugate models. In Section 3.1 we consider standard concave Bayesian logistic regression, while in Section 3.2 and 3.3 we further elaborate on logistic regression by discussing how to deal with automatic variable selection and very large datasets. In Section 3.4 we consider DSVI for Gaussian process hyperparameter inference.
Researcher Affiliation Academia Michalis K. Titsias MTITSIAS@AUEB.GR Department of Informatics, Athens University of Economics and Business, Greece. Miguel L azaro-Gredilla MIGUEL@TSC.UC3M.ES Dpt. Signal Processing & Communications, Universidad Carlos III de Madrid, Spain
Pseudocode Yes Algorithm 1 Doubly stochastic variational inference
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository.
Open Datasets Yes For the above simple setting and using the well-known Pima indians diabetes data set from the UCI repository, we show on Figure 2 the convergence of our method... We applied DSVI-ARD for binary classification in three cancer-related data sets1 that are summarized in Table 1... 1Available from http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html. In order to demonstrate the scalability of the proposed method, we run it on three well-known large-scale binary classification datasets a9a, rcv1, and Epsilon, whose details are listed on Table 3.
Dataset Splits No The paper mentions "Train and test errors" in Table 2 for cancer datasets and that "The value of λ was selected using 5-fold cross-validation", which is a validation technique, but it does not specify a distinct "validation set" with its size or split percentages for the experiments.
Hardware Specification No The paper mentions memory requirements for running algorithms: "Another benefit of DSVI-ARD is the low memory requirements (we needed a 32GB RAM computer to run the ℓ1-logistic regression, whereas a 4GB one was enough for DSVI-ARD)." However, it does not specify CPU/GPU models or other detailed hardware components used for the experiments.
Software Dependencies No The paper mentions the use of "LIBLINEAR software" but does not provide a specific version number. No other key software components are listed with version numbers.
Experiment Setup Yes Finally, the learning rate sequences and annealing schedule when applying DSVI-ARD to all above problems was chosen as follows. The learning rate ρt is initialised to ρ0 = 0.05/#training examples and scaled every 5000 iterations by a factor of 0.95. This learning rate is used to update µ, whereas 0.1ρt is used to update c. A total of 105 iterations was considered. For all problems, mini-batches of size 500 are used, so this process does not ever require the whole data set to be loaded in memory.