reproducibilityindex.ai

Kernelized Wasserstein Natural Gradient

Authors: M Arbel, A Gretton, W Li, G Montufar

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify its accuracy on simple examples, and show the advantage of using such an estimator in classification tasks on Cifar10 and Cifar100 empirically. This section presents an empirical evaluation of (KWNG) based on (19). Figure 3 shows the training and test accuracy at each epoch on Cifar10 in both (WC) and (IC) cases.
Researcher Affiliation	Academia	Michael Arbel, Arthur Gretton Gatsby Computational Neuroscience Unit University College London {michael.n.arbel,arthur.gretton}@gmail.com Wuchen Li University of California, Los Angeles wcli@math.ucla.edu Guido Montúfar University of California, Los Angeles, and Max Planck Institute for Mathematics in the Sciences montufar@mis.mpg.de
Pseudocode	No	The paper describes the proposed method but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code for the experiments is available at https://github.com/MichaelArbel/KWNG.
Open Datasets	Yes	We consider a classification task on two datasets Cifar10 and Cifar100 with a Residual Network He et al. (2015).
Dataset Splits	No	The paper mentions training and test accuracy but does not explicitly provide details about train/validation/test dataset splits or proportions.
Hardware Specification	No	The paper does not specify the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CUDA 11.1').
Experiment Setup	Yes	For all methods, we used a batch-size of 128. The optimal step-size γ was selected in {10,1,10 1,10 2,10 3,10 4} for each method. In the case of SGD with momentum, we used a momentum parameter of 0.9 and a weight decay of either 0 or 5 10 4. For KFAC and EKFAC, we used a damping coefficient of 10 3 and a frequency of reparametrization of 100 updates. For KWGN we set M =5 and λ=0 while the initial value for ϵ is set to ϵ=10 5 and is adjusted using an adaptive scheme based on the Levenberg-Marquardt dynamics as in (Martens and Grosse, 2015, Section 6.5). More precisely, we use the following update equation for ϵ after every 5 iterations of the optimizer: ϵ ωϵ, if r> 3, ϵ ω 1ϵ, if r< 1. Here, r is the reduction ratio: r= max tk 1 t tk Å 2 L(θt)) L(θt+1) where (tk)k are the times when the updates occur. and ω is the decay constant chosen to ω=0.85.