Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Knowledge distillation via softmax regression representation learning

Authors: Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method is extremely simple to implement and straightforward to train and is shown to consistently outperform previous state-of-the-art methods over a large set of experimental settings including different (a) network architectures, (b) teacher-student capacities, (c) datasets, and (d) domains.
Researcher Affiliation Collaboration Jing Yang University of Nottingham Nottingham, UK EMAIL Brais Marinez Samsung AI Center Cambridge, UK EMAIL Adrian Bulat Samsung AI Center Cambridge, UK EMAIL, Georgios Tzimiropoulos Samsung AI Center Cambridge, UK Queen Mary University of London London, UK EMAIL
Pseudocode Yes Algorithm 1 Knowledge distillation via Softmax Regression Representation Learning
Open Source Code Yes The code is available at https://github.com/jingyang2017/KD_SRRL.
Open Datasets Yes CIFAR-10 is a popular image classification dataset consisting of 50,000 training and 10,000 testing images equally distributed across 10 classes. ... For CIFAR-100 (Krizhevsky & Hinton, 2009)... Image Net-1K (Russakovsky et al., 2015).
Dataset Splits No The paper provides specific counts for training and testing images for CIFAR-10 (50,000 training and 10,000 testing), and mentions training and evaluation for ImageNet, but does not explicitly detail a separate validation split with specific counts or percentages.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running experiments.
Software Dependencies No The paper mentions using 'pretrained Py Torch models Paszke et al. (2017)', which indicates the use of PyTorch, but does not specify a version number for PyTorch or any other software dependency.
Experiment Setup Yes The Res Net models were trained for 350 epochs using SGD. The initial learning rate was set to 0.1, and then it was reduced by a factor of 10 at epochs 150, 250 and 320. Similarly, the WRN models were trained for 200 epochs with a learning rate of 0.1 that was subsequently reduced by 5 at epochs 60, 120 and 160. In all experiments, we set the dropout rate to 0. ... Batch size was set to 128. ... We used SGD with Nesterov momentum 0.9, weight decay 1e 4, initial learning rate 0.2 which was then dropped by a factor of 10 every 30 epochs, training in total for 100 epochs.