Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation

Authors: Sobihan Surendran, Adeline Fermanian, Antoine Godichon-Baggioni, Sylvain Le Corff

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide experimental results using Variational Autoenconders (VAE) and applications to several learning frameworks that illustrate our convergence results and show how the effect of bias can be reduced by appropriate hyperparameter tuning. In this section, we illustrate our theoretical results in the context of deep VAE. The experiments were conducted using Py Torch [65], and the source code can be found here2. In generative models, [...] Dataset and Model. We conduct our experiments on the CIFAR-10 dataset [51] and use a Convolutional Neural Network (CNN) architecture with the Rectified Linear Unit (ReLU) activation function for both the encoder and the decoder. The latent space dimension is set to 100. We estimate the log-likelihood using VAE, IWAE, and BR-IWAE models, all of which are trained for 100 epochs.
Researcher Affiliation Collaboration 1Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, Paris, France 2LOPF, Califrais Machine Learning Lab, Paris, France
Pseudocode Yes Algorithm 1 AMSGRAD with Biased Gradients
Open Source Code Yes The experiments were conducted using Py Torch [65], and the source code can be found here2.
Open Datasets Yes Dataset and Model. We conduct our experiments on the CIFAR-10 dataset [51] and use a Convolutional Neural Network (CNN) architecture with the Rectified Linear Unit (ReLU) activation function for both the encoder and the decoder. The latent space dimension is set to 100. We estimate the log-likelihood using VAE, IWAE, and BR-IWAE models, all of which are trained for 100 epochs. [...] We conduct our experiments on two datasets: Fashion MNIST [76] and CIFAR-10.
Dataset Splits Yes The Fashion MNIST dataset ... consists of 28x28 pixel images ... with 60,000 images in the training set and 10,000 images in the test set. CIFAR-10 consists of 32x32 pixel images categorized into 10 different classes. The dataset is divided into 60,000 images in the training set and 10,000 images in the test set.
Hardware Specification Yes In this paper, all simulations were conducted using the Nvidia Tesla T4 GPU. The total computing hours required for the results presented in this paper are estimated to be around 100 to 200 hours of GPU usage.
Software Dependencies No The paper mentions 'Py Torch [65]' as the software used for experiments but does not specify its version number or any other software dependencies with specific versions.
Experiment Setup Yes For all experiments, we use Adagrad, RMSProp, and Adam with a learning rate decay given by γn = Cγ/ n, where Cγ = 0.01 for Adagrad and Cγ = 0.001 for RMSProp and Adam. The momentum parameters are set to ρ1 = 0.9 and ρ2 = 0.999, and the regularization parameter δ is fixed at 5 × 10−2.