reproducibilityindex.ai

KOALA: A Kalman Optimization Algorithm with Loss Adaptivity

Authors: Aram Davtyan, Sepehr Sameni, Llukman Cerkezi, Givi Meishvili, Adam Bielski, Paolo Favaro6471-6479

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide convergence analysis and show experimentally that it yields parameter estimates that are on par with or better than existing state of the art optimization algorithms across several neural network architectures and machine learning tasks, such as computer vision and language modeling. In this section we ablate the following features and parameters of both KOALA-V and KOALA-M algorithms: the dynamics of the weights and velocities, the initialization of the posterior covariance matrix and the adaptivity of the state noise estimators. We evaluate KOALA-M on different tasks, including image classiﬁcation (on CIFAR-10, CIFAR-100 and Image Net (Russakovsky et al. 2015)), generative learning and language modeling.
Researcher Affiliation	Academia	Aram Davtyan, Sepehr Sameni, Llukman Cerkezi, Givi Meishvili, Adam Bielski, Paolo Favaro Computer Vision Group, University of Bern, Switzerland {aram.davtyan, sepehr.sameni, llukman.cerkezi, givi.meishvili, adam.bielski, paolo.favaro}@inf.unibe.ch
Pseudocode	Yes	Algorithm 1: KOALA-V (Vanilla)
Open Source Code	Yes	The project page with the code and the supplementary materials is available at https://araachie.github.io/koala/.
Open Datasets	Yes	We evaluate KOALA-M on different tasks, including image classiﬁcation (on CIFAR-10, CIFAR-100 and Image Net (Russakovsky et al. 2015)). In all the ablations, we choose the classiﬁcation task on CIFAR-100 (Krizhevsky and Hinton 2009).
Dataset Splits	No	The paper reports "Top-1 and Top-5 errors on the validation set" and mentions training for a specific number of epochs, but does not explicitly detail the split percentages or counts for the validation dataset from the overall dataset.
Hardware Specification	No	The paper does not provide specific details on the hardware used for experiments, such as GPU models, CPU specifications, or cloud computing instances with their configurations.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, etc.) within the main text.
Experiment Setup	Yes	We train all the models for 100 epochs and decrease the learning rate by a factor of 0.2 every 30 epochs. For SGD we set the momentum rate to 0.9, which is the default for many popular networks, and for Adam we use the default parameters β1 = 0.9, β2 = 0.999, ϵ = 10 8. In all experiments on CIFAR-10/100, we use a batch size of 128 and basic data augmentation (random horizontal ﬂipping and random cropping with padding by 4 pixels). For all the algorithms, we additionally use a weight decay of 0.0005.