The Unreasonable Effectiveness of Fully-Connected Layers for Low-Data Regimes

Authors: Peter Kocsis, Peter Súkeník, Guillem Braso, Matthias Niessner, Laura Leal-Taixé, Ismail Elezi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform classification experiments for a large range of network backbones and several standard datasets on supervised learning and active learning.
Researcher Affiliation Academia Peter Kocsis Technical University of Munich peter.kocsis@tum.de Peter Súkeník Technical University of Munich peter.sukenik@trojsten.sk Guillem Brasó Technical University of Munich guillem.braso@tum.de Matthias Nießner Technical University of Munich niessner@tum.de Laura Leal-Taixé Technical University of Munich leal.taixe@tum.de Ismail Elezi Technical University of Munich ismail.elezi@tum.de
Pseudocode No The paper describes methods using figures and text but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes peter-kocsis.github.io/Low Data Generalization/ and "Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]"
Open Datasets Yes For all experiments, we report accuracy as the primary metric and use four public datasets: CIFAR10 [18], CIFAR100 [18], Caltech101 [19], and Caltech256 [19].
Dataset Splits Yes We use the predefined train/test split for the CIFAR datasets, while we split the Caltech datasets into 70% training and 30% testing, maintaining the class distribution.
Hardware Specification No We train each network in a single GPU. This statement is too general and does not specify the type or model of GPU used.
Software Dependencies No The paper mentions optimizers and losses (e.g., SGD optimizer, cross-entropy loss) but does not provide specific software names with version numbers for libraries or frameworks used.
Experiment Setup Yes For the CIFAR experiments, we follow the training procedure of [20]. More precisely, we train our networks for 200 epochs using SGD optimizer with learning rate 0.1, momentum 0.9, weight decay 5e 4, and divide the learning rate by 10 after 80% epochs. We used cross-entropy loss as supervision. For the more complex Caltech datasets, we start with an Imagenet-pre-trained backbone and reduce the dimensionality in our FR only to 256. We use the same training setup for a full fine-tuning, except that we reduce the initial learning rate to 1e 3 and train for only 100 epochs.