Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

KOALA++: Efficient Kalman-Based Optimization with Gradient-Covariance Products

Authors: Zixuan XIa, Aram Davtyan, Paolo Favaro

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate KOALA++ on a range of vision and language tasks to demonstrate its generality, stability, and convergence behavior compared to existing optimizers. Our experiments are organized into three parts: image classification, language modeling and ablation studies.
Researcher Affiliation	Academia	Zixuan Xia EMAIL Aram Davtyan EMAIL Paolo Favaro EMAIL University of Bern Computer Vision Group, University of Bern
Pseudocode	Yes	Algorithm 1 KOALA++ Initialize θ0, v1, Q, R, and fix the learning rate schedule ηk for k = 2 to T do For simplicity, denote Hk = ∇Lk(θk−1) Calculate αk, λk, rk respectively from Equations (13), (8), and (16) Update: vk = (αk − λk)vk−1 + (Hk − λk Hk−1)Q + rk Hk−1 (19) θk = θk−1 − ηk Lk(θk−1) Hk∈vk + Hk ∈QH k + R (∈vk + QH k ) (20)
Open Source Code	Yes	The code is publicly available at https://github.com/Sumxiaa/KOALA_Plus_Plus.
Open Datasets	Yes	We plan to release the code upon publication, and all datasets used (CIFAR10/100, Wiki Text-2) are publicly available.
Dataset Splits	Yes	We follow the experimental setup of the original KOALA paper [4] for CIFAR-10 and CIFAR-100 classification tasks, including data augmentation, model architectures, and optimization settings.
Hardware Specification	Yes	All experiments reported in this work were conducted on a server equipped with a single NVIDIA H100 GPU with 80 gigabytes of VRAM and 128 gigabytes of RAM. Unless otherwise stated, all training and evaluation tasks were executed using this configuration.
Software Dependencies	No	The paper does not explicitly provide specific software dependencies with version numbers (e.g., Python, PyTorch versions). It mentions algorithms and models but not their software implementations with versions.
Experiment Setup	Yes	For CIFAR-10, we initialize both σ0 and Q to 0.1, with an initial learning rate of 1.0. For CIFAR-100, which has more classes and a richer data distribution, we adopt slightly larger values σ0 = Q = 0.2 and increase the initial learning rate to 2.0. A weight decay of 5 × 10−4 is applied to all ResNet and other CNN models.