Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation

Authors: Kai Huang, Hanyun Yin, Heng Huang, Wei Gao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results show that Green Trainer can save up to 64% training FLOPs compared to full fine-tuning, without any noticeable accuracy loss.
Researcher Affiliation Academia University of Pittsburgh , University of Maryland, College Park University of Science and Technology of China
Pseudocode No The paper contains diagrams and explanations but does not include any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes Our experiments are mainly conducted using the following two datasets of abstractive summarization: Sci TLDR (Cachola et al., 2020) and Dialog Sum (Chen et al., 2021). We also perform generative QA tasks on Web Question (Berant et al., 2013) and PIQA (Bisk et al., 2020) datasets in Appendix A.4.
Dataset Splits No The paper mentions using 'test data' but does not provide specific percentages or counts for train/validation/test splits, nor does it refer to standard predefined splits with sufficient detail for reproduction.
Hardware Specification No The paper mentions 'A100-80GB GPUs' in an example scenario in the introduction, and refers to 'GPUs we use' in the appendix, but it does not specify the exact GPU models, CPUs, or detailed hardware configurations used for *their* experiments.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies crucial for replication.
Experiment Setup Yes In all experiments, we use a batch size of 4 and fine-tune the model for 5 epochs. We use the Adam W optimizer (Loshchilov and Hutter, 2017) at a learning rate of 2 10 5 with linear schedule and weight decay of 10 2.