Churn Reduction via Distillation
Authors: Heinrich Jiang, Harikrishna Narasimhan, Dara Bahri, Andrew Cotter, Afshin Rostamizadeh
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method across a large number of datasets and neural network architectures.5 EXPERIMENTS |
| Researcher Affiliation | Industry | Heinrich Jiang, Harikrishna Narasimhan, Dara Bahri, Andrew Cotter, Afshin Rostamizadeh Google Research {heinrichj, hnarasimhan, dbahri, acotter, rostami}@google.com |
| Pseudocode | Yes | Algorithm 1 Distillation-based Churn Reduction |
| Open Source Code | Yes | Reproducibility Statement: All details of experimental setup are in the main text, along with descriptions of the baselines and what hyperparameters were swept across. Code can be found in the Appendix. All proofs are in the Appendix. |
| Open Datasets | Yes | Datasets and architectures: The following are the datasets we use in our experiments, along with the associated model architectures: 12 Open ML datasets using fully-connected neural networks. 10 MNIST variants, SVHN, CIFAR10, 40 Celeb A tasks using convolutional networks. CIFAR10 and CIFAR100 with Res Net-50, Res Net-101, and Res Net-152. IMDB dataset using transformer network. |
| Dataset Splits | Yes | For each dataset, we use the standard train/test split if available, otherwise, we fix a random train/test split with ratio 2:1. we randomly select from the training set 1000 initial examples, 100 validation examples, and a batch of 1000 examples |
| Hardware Specification | Yes | For each run, we used a NVIDIA V100 GPU, which took up to several days to finish all 100 trials. |
| Software Dependencies | No | Code for the models in Keras can be found in the Appendix. and imports like tf.keras.Sequential are present, but specific version numbers for these software dependencies are not provided in the text. |
| Experiment Setup | Yes | train an initial model using Adam optimizer with default settings on the initial set and early stopping (i.e. stop when there s no improvement on the validation loss after 5 epochs) and default random initialization For distillation, we tune the trade-off parameter λ across t0.1, 0.2, ..., 0.9u. |