reproducibilityindex.ai

Churn Reduction via Distillation

Authors: Heinrich Jiang, Harikrishna Narasimhan, Dara Bahri, Andrew Cotter, Afshin Rostamizadeh

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our method across a large number of datasets and neural network architectures.5 EXPERIMENTS
Researcher Affiliation	Industry	Heinrich Jiang, Harikrishna Narasimhan, Dara Bahri, Andrew Cotter, Afshin Rostamizadeh Google Research {heinrichj, hnarasimhan, dbahri, acotter, rostami}@google.com
Pseudocode	Yes	Algorithm 1 Distillation-based Churn Reduction
Open Source Code	Yes	Reproducibility Statement: All details of experimental setup are in the main text, along with descriptions of the baselines and what hyperparameters were swept across. Code can be found in the Appendix. All proofs are in the Appendix.
Open Datasets	Yes	Datasets and architectures: The following are the datasets we use in our experiments, along with the associated model architectures: 12 Open ML datasets using fully-connected neural networks. 10 MNIST variants, SVHN, CIFAR10, 40 Celeb A tasks using convolutional networks. CIFAR10 and CIFAR100 with Res Net-50, Res Net-101, and Res Net-152. IMDB dataset using transformer network.
Dataset Splits	Yes	For each dataset, we use the standard train/test split if available, otherwise, we ﬁx a random train/test split with ratio 2:1. we randomly select from the training set 1000 initial examples, 100 validation examples, and a batch of 1000 examples
Hardware Specification	Yes	For each run, we used a NVIDIA V100 GPU, which took up to several days to ﬁnish all 100 trials.
Software Dependencies	No	Code for the models in Keras can be found in the Appendix. and imports like tf.keras.Sequential are present, but specific version numbers for these software dependencies are not provided in the text.
Experiment Setup	Yes	train an initial model using Adam optimizer with default settings on the initial set and early stopping (i.e. stop when there s no improvement on the validation loss after 5 epochs) and default random initialization For distillation, we tune the trade-off parameter λ across t0.1, 0.2, ..., 0.9u.