Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

KOALA++: Efficient Kalman-Based Optimization with Gradient-Covariance Products

Authors: Zixuan XIa, Aram Davtyan, Paolo Favaro

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate KOALA++ on a range of vision and language tasks to demonstrate its generality, stability, and convergence behavior compared to existing optimizers. Our experiments are organized into three parts: image classification, language modeling and ablation studies.
Researcher Affiliation Academia Zixuan Xia EMAIL Aram Davtyan EMAIL Paolo Favaro EMAIL University of Bern Computer Vision Group, University of Bern
Pseudocode Yes Algorithm 1 KOALA++ Initialize θ0, v1, Q, R, and fix the learning rate schedule ηk for k = 2 to T do For simplicity, denote Hk = ∇Lk(θk−1) Calculate αk, λk, rk respectively from Equations (13), (8), and (16) Update: vk = (αk − λk)vk−1 + (Hk − λk Hk−1)Q + rk Hk−1 (19) θk = θk−1 − ηk Lk(θk−1) Hk∈vk + Hk ∈QH k + R (∈vk + QH k ) (20)
Open Source Code Yes The code is publicly available at https://github.com/Sumxiaa/KOALA_Plus_Plus.
Open Datasets Yes We plan to release the code upon publication, and all datasets used (CIFAR10/100, Wiki Text-2) are publicly available.
Dataset Splits Yes We follow the experimental setup of the original KOALA paper [4] for CIFAR-10 and CIFAR-100 classification tasks, including data augmentation, model architectures, and optimization settings.
Hardware Specification Yes All experiments reported in this work were conducted on a server equipped with a single NVIDIA H100 GPU with 80 gigabytes of VRAM and 128 gigabytes of RAM. Unless otherwise stated, all training and evaluation tasks were executed using this configuration.
Software Dependencies No The paper does not explicitly provide specific software dependencies with version numbers (e.g., Python, PyTorch versions). It mentions algorithms and models but not their software implementations with versions.
Experiment Setup Yes For CIFAR-10, we initialize both σ0 and Q to 0.1, with an initial learning rate of 1.0. For CIFAR-100, which has more classes and a richer data distribution, we adopt slightly larger values σ0 = Q = 0.2 and increase the initial learning rate to 2.0. A weight decay of 5 × 10−4 is applied to all ResNet and other CNN models.