Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation
Authors: Kai Huang, Hanyun Yin, Heng Huang, Wei Gao
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results show that Green Trainer can save up to 64% training FLOPs compared to full fine-tuning, without any noticeable accuracy loss. |
| Researcher Affiliation | Academia | University of Pittsburgh , University of Maryland, College Park University of Science and Technology of China |
| Pseudocode | No | The paper contains diagrams and explanations but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | Our experiments are mainly conducted using the following two datasets of abstractive summarization: Sci TLDR (Cachola et al., 2020) and Dialog Sum (Chen et al., 2021). We also perform generative QA tasks on Web Question (Berant et al., 2013) and PIQA (Bisk et al., 2020) datasets in Appendix A.4. |
| Dataset Splits | No | The paper mentions using 'test data' but does not provide specific percentages or counts for train/validation/test splits, nor does it refer to standard predefined splits with sufficient detail for reproduction. |
| Hardware Specification | No | The paper mentions 'A100-80GB GPUs' in an example scenario in the introduction, and refers to 'GPUs we use' in the appendix, but it does not specify the exact GPU models, CPUs, or detailed hardware configurations used for *their* experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies crucial for replication. |
| Experiment Setup | Yes | In all experiments, we use a batch size of 4 and fine-tune the model for 5 epochs. We use the Adam W optimizer (Loshchilov and Hutter, 2017) at a learning rate of 2 10 5 with linear schedule and weight decay of 10 2. |