reproducibilityindex.ai

On Measuring Excess Capacity in Neural Networks

Authors: Florian Graf, Sebastian Zeng, Bastian Rieck, Marc Niethammer, Roland Kwitt

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on benchmark datasets of varying task difficulty indicate that (1) there is a substantial amount of excess capacity per task, and (2) capacity can be kept at a surprisingly similar level across tasks.
Researcher Affiliation	Academia	Florian Graf University of Salzburg florian.graf@plus.ac.at Sebastian Zeng University of Salzburg sebastian.zeng@plus.ac.at Bastian Rieck Institute for AI and Health Helmholtz Munich bastian@rieck.me Marc Niethammer UNC Chapel Hill mn@cs.unc.edu Roland Kwitt University of Salzburg roland.kwitt@plus.ac.at
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Source code is available at https://github.com/rkwitt/excess_capacity.
Open Datasets	Yes	We test on three benchmark datasets: CIFAR10/100 [25], and Tiny-Image Net-200 [24], listed in order of increasing task difficulty.
Dataset Splits	No	We adhere to the common training/testing splits of the three datasets we used, i.e., CIFAR10/100 and Tiny-Image Net-200. (Does not mention validation or specific percentages/numbers for splits).
Hardware Specification	Yes	Section B.4 lists all hardware resources used in our experiments.
Software Dependencies	No	The paper mentions optimizers (SGD) and frameworks (PyTorch, TensorFlow, JAX) in general or in relation to third-party tools, but does not provide specific version numbers for the software dependencies used in their own experimental setup.
Experiment Setup	Yes	We minimize the cross-entropy loss using SGD with momentum (0.9) and small weight decay (1e-4) for 200 epochs with batch size 256 and follow a CIFAR-typical stepwise learning rate schedule, decaying the initial learning rate (of 3e-3) by a factor of 5 at epochs 60, 120 & 160. No data augmentation is used. When projecting onto the constraint sets, we found one alternating projection step every 15th SGD update to be sufficient to remain close to C.