reproducibilityindex.ai

Contrastive Representation Distillation

Authors: Yonglong Tian, Dilip Krishnan, Phillip Isola

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that our resulting new objective outperforms knowledge distillation and other cutting-edge distillers on a variety of knowledge transfer tasks, including single model compression, ensemble distillation, and cross-modal transfer. Our method sets a new state-of-the-art in many transfer tasks, and sometimes even outperforms the teacher network when combined with knowledge distillation.
Researcher Affiliation	Collaboration	Yonglong Tian MIT CSAIL yonglong@mit.edu Dilip Krishnan Google Research dilipkay@google.com Phillip Isola MIT CSAIL phillipi@mit.edu
Pseudocode	No	The paper does not include a distinct pseudocode block or algorithm listing. It describes its methods using mathematical equations and descriptive text.
Open Source Code	Yes	Code: http://github.com/Hobbit Long/Rep Distiller.
Open Datasets	Yes	Datasets (1) CIFAR-100 (Krizhevsky & Hinton, 2009) contains 50K training images... (2) Image Net (Deng et al., 2009) provides 1.2 million images from 1K classes for training... (3) STL-10 (Coates et al., 2011) consists of a training set of 5K labeled images... (4) Tiny Image Net (Deng et al., 2009) has 200 classes, each with 500 training images... (5) NYU-Depth V2 (Silberman et al., 2012) consists of 1449 indoor images, each labeled with dense depth image and semantic map.
Dataset Splits	Yes	Image Net (Deng et al., 2009) provides 1.2 million images from 1K classes for training and 50K for validation.
Hardware Specification	Yes	In practice, we did not notice signiﬁcant difference of training time on Image Net (e.g., 1.75 epochs/hour v.s. 1.67 epochs/hour on two Titan-V GPUs).
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with their specific versions.
Experiment Setup	Yes	For CIFAR-100, we initialize the learning rate as 0.05, and decay it by 0.1 every 30 epochs after the ﬁrst 150 epochs until the last 240 epoch. For Mobile Net V2, Shufﬂe Net V1 and Shufﬂe Net V2, we use a learning rate of 0.01... Batch size is 64 for CIFAR-100 or 256 for Image Net. We have validated different N: 16, 64, 256, 1024, 4096, 16384. We varied τ between 0.02 and 0.3. All experiments but those on Image Net use a temperature of 0.1. For Image Net, we use τ = 0.07.