reproducibilityindex.ai

Knowledge distillation via softmax regression representation learning

Authors: Jing Yang, Brais Martinez, Adrian Bulat, Georgios Tzimiropoulos

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method is extremely simple to implement and straightforward to train and is shown to consistently outperform previous state-of-the-art methods over a large set of experimental settings including different (a) network architectures, (b) teacher-student capacities, (c) datasets, and (d) domains.
Researcher Affiliation	Collaboration	Jing Yang University of Nottingham Nottingham, UK jing.yang2@nottingham.ac.uk Brais Marinez Samsung AI Center Cambridge, UK brais.mart@gmail.com Adrian Bulat Samsung AI Center Cambridge, UK adrian@adrianbulat.com, Georgios Tzimiropoulos Samsung AI Center Cambridge, UK Queen Mary University of London London, UK g.tzimiropoulos@qmul.ac.uk
Pseudocode	Yes	Algorithm 1 Knowledge distillation via Softmax Regression Representation Learning
Open Source Code	Yes	The code is available at https://github.com/jingyang2017/KD_SRRL.
Open Datasets	Yes	CIFAR-10 is a popular image classiﬁcation dataset consisting of 50,000 training and 10,000 testing images equally distributed across 10 classes. ... For CIFAR-100 (Krizhevsky & Hinton, 2009)... Image Net-1K (Russakovsky et al., 2015).
Dataset Splits	No	The paper provides specific counts for training and testing images for CIFAR-10 (50,000 training and 10,000 testing), and mentions training and evaluation for ImageNet, but does not explicitly detail a separate validation split with specific counts or percentages.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running experiments.
Software Dependencies	No	The paper mentions using 'pretrained Py Torch models Paszke et al. (2017)', which indicates the use of PyTorch, but does not specify a version number for PyTorch or any other software dependency.
Experiment Setup	Yes	The Res Net models were trained for 350 epochs using SGD. The initial learning rate was set to 0.1, and then it was reduced by a factor of 10 at epochs 150, 250 and 320. Similarly, the WRN models were trained for 200 epochs with a learning rate of 0.1 that was subsequently reduced by 5 at epochs 60, 120 and 160. In all experiments, we set the dropout rate to 0. ... Batch size was set to 128. ... We used SGD with Nesterov momentum 0.9, weight decay 1e 4, initial learning rate 0.2 which was then dropped by a factor of 10 every 30 epochs, training in total for 100 epochs.