Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Measuring Excess Capacity in Neural Networks

Authors: Florian Graf, Sebastian Zeng, Bastian Rieck, Marc Niethammer, Roland Kwitt

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on benchmark datasets of varying task difficulty indicate that (1) there is a substantial amount of excess capacity per task, and (2) capacity can be kept at a surprisingly similar level across tasks.
Researcher Affiliation Academia Florian Graf University of Salzburg EMAIL Sebastian Zeng University of Salzburg EMAIL Bastian Rieck Institute for AI and Health Helmholtz Munich EMAIL Marc Niethammer UNC Chapel Hill EMAIL Roland Kwitt University of Salzburg EMAIL
Pseudocode No The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Source code is available at https://github.com/rkwitt/excess_capacity.
Open Datasets Yes We test on three benchmark datasets: CIFAR10/100 [25], and Tiny-Image Net-200 [24], listed in order of increasing task difficulty.
Dataset Splits No We adhere to the common training/testing splits of the three datasets we used, i.e., CIFAR10/100 and Tiny-Image Net-200. (Does not mention validation or specific percentages/numbers for splits).
Hardware Specification Yes Section B.4 lists all hardware resources used in our experiments.
Software Dependencies No The paper mentions optimizers (SGD) and frameworks (PyTorch, TensorFlow, JAX) in general or in relation to third-party tools, but does not provide specific version numbers for the software dependencies used in their own experimental setup.
Experiment Setup Yes We minimize the cross-entropy loss using SGD with momentum (0.9) and small weight decay (1e-4) for 200 epochs with batch size 256 and follow a CIFAR-typical stepwise learning rate schedule, decaying the initial learning rate (of 3e-3) by a factor of 5 at epochs 60, 120 & 160. No data augmentation is used. When projecting onto the constraint sets, we found one alternating projection step every 15th SGD update to be sufficient to remain close to C.