Asymptotics of Alpha-Divergence Variational Inference Algorithms with Exponential Families

Authors: François Bertholom, randal douc, François Roueff

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental While the main focus of this paper is theoretical, we also provide empirical results on both toy examples and real data in Section 6. 6 Experiments
Researcher Affiliation Academia François Bertholom SAMOVAR, Télécom Sud-Paris Institut Polytechnique de Paris, France francois.bertholom@telecom-sudparis.eu Randal Douc SAMOVAR, Télécom Sud-Paris Institut Polytechnique de Paris, France randal.douc@telecom-sudparis.eu François Roueff LTCI, Télécom Paris Institut Polytechnique de Paris, France francois.roueff@telecom-paris.fr
Pseudocode No The paper describes algorithms using mathematical equations for updates (e.g., (6), (9), (11)) but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes The code written for our experiments is attached to the submission. The code written for the experiments is provided in the submission.
Open Datasets Yes We evaluate the two approaches described in Section 5 on the image datasets CIFAR10 (50 000 images of size 32 32) and Celeb A (192 599 randomly chosen training images cropped to 128 128)...
Dataset Splits No The paper mentions training and test set sizes, but does not explicitly state validation dataset splits (percentages or counts) or cross-validation setup.
Hardware Specification Yes Training takes 30 minutes per model on CIFAR10 and a few hours on Celeb A, using a single V100 GPU.
Software Dependencies No The paper mentions 'Adam [20]' as an optimizer but does not specify version numbers for any software dependencies.
Experiment Setup Yes We choose the number of samples to be K = 5. The weights are optimized with Adam [20], using learning rates of 8e 4 on CIFAR10 and 2e 4 on Celeb A, and (β1, β2) = (0.9, 0.999) in both cases. We set the batch size to 256 and train for 100 epochs on CIFAR10 and 30 epochs on Celeb A, for a total of roughly 20 000 iterations on both datasets.