Asymptotics of Alpha-Divergence Variational Inference Algorithms with Exponential Families
Authors: François Bertholom, randal douc, François Roueff
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While the main focus of this paper is theoretical, we also provide empirical results on both toy examples and real data in Section 6. 6 Experiments |
| Researcher Affiliation | Academia | François Bertholom SAMOVAR, Télécom Sud-Paris Institut Polytechnique de Paris, France francois.bertholom@telecom-sudparis.eu Randal Douc SAMOVAR, Télécom Sud-Paris Institut Polytechnique de Paris, France randal.douc@telecom-sudparis.eu François Roueff LTCI, Télécom Paris Institut Polytechnique de Paris, France francois.roueff@telecom-paris.fr |
| Pseudocode | No | The paper describes algorithms using mathematical equations for updates (e.g., (6), (9), (11)) but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | The code written for our experiments is attached to the submission. The code written for the experiments is provided in the submission. |
| Open Datasets | Yes | We evaluate the two approaches described in Section 5 on the image datasets CIFAR10 (50 000 images of size 32 32) and Celeb A (192 599 randomly chosen training images cropped to 128 128)... |
| Dataset Splits | No | The paper mentions training and test set sizes, but does not explicitly state validation dataset splits (percentages or counts) or cross-validation setup. |
| Hardware Specification | Yes | Training takes 30 minutes per model on CIFAR10 and a few hours on Celeb A, using a single V100 GPU. |
| Software Dependencies | No | The paper mentions 'Adam [20]' as an optimizer but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | We choose the number of samples to be K = 5. The weights are optimized with Adam [20], using learning rates of 8e 4 on CIFAR10 and 2e 4 on Celeb A, and (β1, β2) = (0.9, 0.999) in both cases. We set the batch size to 256 and train for 100 epochs on CIFAR10 and 30 epochs on Celeb A, for a total of roughly 20 000 iterations on both datasets. |