LEMON: Lossless model expansion
Authors: Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch. |
| Researcher Affiliation | Collaboration | Yite Wang1, , Jiahao Su2, , Hanlin Lu2, Cong Xie2, Tianyi Liu2, Jianbo Yuan2, Haibin Lin2, Ruoyu Sun3,4, Hongxia Yang2 1University of Illinois Urbana-Champaign, USA 2Byte Dance Inc. 3The Chinese University of Hong Kong, Shenzhen, China 4Shenzhen Research Institute of Big Data |
| Pseudocode | No | The paper describes the expansion procedure in textual form across several sections (e.g., Section 4, C.1) but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include a direct statement or link for the open-sourcing of its own code. |
| Open Datasets | Yes | We train Vision Transformers on the Image Net-1k (Deng et al., 2009) dataset... The model is trained on the English Wiki corpus as per the methods in Tan & Bansal (2020)... For fine-tuning task of BERT on the GLUE (Wang et al., 2018) dataset |
| Dataset Splits | Yes | We train Vision Transformers on the Image Net-1k (Deng et al., 2009) dataset... The model is trained on the English Wiki corpus as per the methods in Tan & Bansal (2020)... For fine-tuning task of BERT on the GLUE (Wang et al., 2018) dataset |
| Hardware Specification | Yes | We conduct all experiments with NVIDIA-V100 and NVIDIA-A100 GPUs. |
| Software Dependencies | No | The paper mentions using "Huggingface s Transformers package (Wolf et al., 2019)", "official code base of Dei T4 (Touvron et al., 2021)", and "Pytorch (Paszke et al., 2019)", but it does not specify exact version numbers for these software dependencies. |
| Experiment Setup | Yes | When training these models from scratch, we apply a default maximum learning rate of 1 10 3 and run the training for 300 epochs with a batch size of 1024. |