Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Authors: Chen Liang, Haoming Jiang, Zheng Li, Xianfeng Tang, Bing Yin, Tuo Zhao
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Homo Distil achieves significant improvements on existing baselines. |
| Researcher Affiliation | Collaboration | Georgia Institute of Technology, Amazon EMAIL, EMAIL |
| Pseudocode | Yes | The complete algorithm is shown in Alg. 1. Algorithm 1 Homo Distil: Homotopic Distillation |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology or a link to a repository for Homo Distil. |
| Open Datasets | Yes | We distill the student using the open-domain corpus for BERT pre-training (Devlin et al., 2018), i.e., Wikipedia 2, an English Wikipedia corpus containing 2500M words, and Toronto Book Corpus (Zhu et al., 2015), containing 800M words. |
| Dataset Splits | Yes | Table 16: Summary of the GLUE benchmark. Corpus Task #Train #Dev #Test #Label Metrics ... MNLI NLI 393k 20k 20k 3 Accuracy |
| Hardware Specification | Yes | The continual pre-training experiment runs for around 13 hours on 8 Nvidia A100 GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch's profiler package and the Adam optimizer, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For all experiments, we use a max sequence length of 128 and a batch size of 4k. We train the student model for T = 28k steps (3 epochs). We use Adam (Kingma & Ba, 2014) as the optimizer with =(0.9, 0.999), = 1 10 6. We use a learning rate of 3 10 4 for Homo BERT-base and 6 10 4 for Homo BERT-small/xsmall/tiny. |