reproducibilityindex.ai

LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models

Authors: Ahmad Faiz, Sotaro Kaneda, Ruhan Wang, Rita Chukwunyere Osi, Prateek Sharma, Fan Chen, Lei Jiang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When validated against Google s published LLM carbon footprints, the results generated by LLMCarbon exhibit differences of only 8.2%, and thus are more accurate than those of mlco2. We employ LLMCarbon to compute the operational footprints of five LLMs, including dense and Mo E architectures, developed by Google, Open AI, and Meta during their training phases. We also compute the operational footprint of another LLM, Noor (Lakim et al., 2022), during its storage phase. To validate the predictions of LLMCarbon, we compare our calculated operational footprint values with the previously published data for these LLMs.
Researcher Affiliation	Academia	Ahmad Faiz, Sotaro Kaneda, Ruhan Wang, Rita Osi , Prateek Sharma, Fan Chen, Lei Jiang Indiana University Jackson State University {afaiz,skaneda,ruhwang,prateeks,fc7,jiang60}@iu.edu j00967039@students.jsums.edu
Pseudocode	No	No pseudocode or algorithm blocks are explicitly provided or labeled in the paper.
Open Source Code	Yes	The source code is released at https://github.com/Sotaro Kaneda/MLCarbon.
Open Datasets	Yes	The inputs on the parameters of LLMs, hardware, and data centers, and the actual training operational carbon footprint values of these LLMs were collected from (Patterson et al., 2021) and (Wu et al., 2022).
Dataset Splits	No	The paper validates LLMCarbon against published operational and embodied carbon footprint data from other research, such as 'Google s published LLM carbon footprints' and 'Meta XLM', rather than defining explicit training, validation, and test splits for its own experimental setup.
Hardware Specification	Yes	Table 4: The validation on the operational carbon footprints of various LLMs. LLM T5 GPT3 GShard Switch XLM ... computing device TPUv3 V100 TPUv3 TPUv3 V100 device TPD (W) 450 300 450 450 300 avg. system power (W) 310 330 288 245 342 peak TFLOPs/s 123 125 123 123 125 achieved TFLOPs/s 45.6 24.6 48 34.4 26.5 hardware efficiency 37% 19.7% 39% 28% 21.2% device # 512 10K 1K 1K 512. Also Table 5: The embodied carbon footprint validation against Meta XLM. hardware number CO2eqchip time lifetime CO2eqemb (kg CO2eq) (t CO2eq) GPU 512 9.78 1.12% 0.056 CPU 64 1.47 1.12% 0.0018 SSD 64 576 1.12% 0.412 DRAM 64 102.4 1.12% 0.073 others 64 148.2 1.12% 0.096 predicted sum 0.64 actual 0.66 t CO2eq, 3.05%
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, or specific library versions) are mentioned in the paper.
Experiment Setup	Yes	Table 4 presents the validation results of LLMCarbon s predictions on the training operational carbon footprint. To validate the training operational carbon footprint estimations yielded by LLMCarbon, we selected five LLMs: T5 (Raffel et al., 2020), GPT-3 (Brown et al., 2020), GShard (Lepikhin et al., 2021), Switch (Fedus et al., 2022), and XLM (Conneau et al., 2020). We list the inputs and outputs of LLMCarbon in Table 4. Within the table, device TPD (W) indicates the Chip Thermal Design Power of a computing device, while avg. system power (W) conveys the average system power per computing device, including TPU/GPU, host CPU, DRAM, and network interface. The inputs on the parameters of LLMs, hardware, and data centers, and the actual training operational carbon footprint values of these LLMs were collected from (Patterson et al., 2021) and (Wu et al., 2022).