reproducibilityindex.ai

Robust Active Distillation

Authors: Cenk Baykal, Khoa Trinh, Fotis Iliopoulos, Gaurav Menghani, Erik Vee

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical evaluations on popular benchmarks that demonstrate the improved distillation performance enabled by our work relative to that of state-of-the-art active learning and active distillation methods. We present empirical evaluations on popular benchmarks that demonstrate the improved distillation performance enabled by our work relative to that of state-of-the-art active learning and active distillation methods.
Researcher Affiliation	Industry	Cenk Baykal, Khoa Trinh, Fotis Iliopoulos, Gaurav Menghani, Erik Vee Google Research {baykalc,khoatrinh,fotisi,gmenghani,erikvee}@google.com
Pseudocode	Yes	Algorithm 1 ACTIVEDISTILLATION; Algorithm 2 DEPROUND
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets	Yes	We considered the CIFAR10/CIFAR100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and Image Net (Deng et al., 2009) data sets.
Dataset Splits	Yes	We considered the CIFAR10/CIFAR100 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011), and Image Net (Deng et al., 2009) data sets. Unless otherwise speciﬁed, we use the Adam optimizer (Kingma & Ba, 2014) with a batch size of 128 with data set speciﬁc learning rate schedules. We follow the active distillation setting shown in Alg. 1 with various conﬁgurations. We used a validation data set of size 1, 000 for the CIFAR10, CIFAR100, and SVHN data sets, and used a validation data set of size 10, 000 for Image Net, respectively, to estimate m.
Hardware Specification	Yes	We use 64 Cloud TPU v4s each with two cores. We conduct our evaluations on 64 TPU v4s each with two cores.
Software Dependencies	No	The paper mentions 'Python' and 'TensorFlow (Abadi et al., 2015)' and 'Adam optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for TensorFlow or Python. It states 'We implemented all algorithms in Python and used the Tensor Flow (Abadi et al., 2015) deep learning library.'
Experiment Setup	Yes	Unless otherwise speciﬁed, we use the Adam optimizer (Kingma & Ba, 2014) with a batch size of 128 with data set speciﬁc learning rate schedules. We train the student model for 100 epochs using SGD with momentum (= 0.9) with batch size 256 and a learn rate schedule as follows. For the ﬁrst 5 epochs, we linearly increase the learning rate from 0 to 0.1, the next 30 epochs we use a learning rate of 0.1, the next 30 after that, we use a learning rate of 0.01, the next 20 we use a learning rate of 0.001, and use a learning rate of 0.0001 for the remaining epochs. We used the Adam optimizer (Kingma & Ba, 2014) with the default parameters except for the learning rate schedule which was as follows. For a given number of epochs nepochs {100, 200}, we used 1e 3 as the learning rate for the ﬁrst (2/5)nepochs, then used 1e 4 until (3/5)nepochs, 1e 5 until (4/5)nepochs, 1e 6 until (9/10)nepochs, and ﬁnally 5e 7 until then end. We used rounded values for the epoch windows that determine the learning rate schedule to integral values whenever necessary.