reproducibilityindex.ai

LEMON: Lossless model expansion

Authors: Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch.
Researcher Affiliation	Collaboration	Yite Wang1, , Jiahao Su2, , Hanlin Lu2, Cong Xie2, Tianyi Liu2, Jianbo Yuan2, Haibin Lin2, Ruoyu Sun3,4, Hongxia Yang2 1University of Illinois Urbana-Champaign, USA 2Byte Dance Inc. 3The Chinese University of Hong Kong, Shenzhen, China 4Shenzhen Research Institute of Big Data
Pseudocode	No	The paper describes the expansion procedure in textual form across several sections (e.g., Section 4, C.1) but does not include formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include a direct statement or link for the open-sourcing of its own code.
Open Datasets	Yes	We train Vision Transformers on the Image Net-1k (Deng et al., 2009) dataset... The model is trained on the English Wiki corpus as per the methods in Tan & Bansal (2020)... For fine-tuning task of BERT on the GLUE (Wang et al., 2018) dataset
Dataset Splits	Yes	We train Vision Transformers on the Image Net-1k (Deng et al., 2009) dataset... The model is trained on the English Wiki corpus as per the methods in Tan & Bansal (2020)... For fine-tuning task of BERT on the GLUE (Wang et al., 2018) dataset
Hardware Specification	Yes	We conduct all experiments with NVIDIA-V100 and NVIDIA-A100 GPUs.
Software Dependencies	No	The paper mentions using "Huggingface s Transformers package (Wolf et al., 2019)", "official code base of Dei T4 (Touvron et al., 2021)", and "Pytorch (Paszke et al., 2019)", but it does not specify exact version numbers for these software dependencies.
Experiment Setup	Yes	When training these models from scratch, we apply a default maximum learning rate of 1 10 3 and run the training for 300 epochs with a batch size of 1024.