Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Collage: Light-Weight Low-Precision Strategy for LLM Training
Authors: Tao Yu, Gaurav Gupta, Karthick Gopalswamy, Amith R Mamidala, Hao Zhou, Jeffrey Huynh, Youngsuk Park, Ron Diamant, Anoop Deoras, Luke Huan
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that pre-training using COLLAGE removes the requirement of using 32-bit floating-point copies of the model and attains similar/better training performance compared to (16, 32)-bit mixed-precision strategy, with up to 3.7 speedup and 15% to 23% less memory usage in practice. |
| Researcher Affiliation | Collaboration | 1Cornell University, Ithaca, NY 2AWS AI Labs, Santa Clara, CA 3AWS Annapurna Labs, Cupertino, CA 4AWS Sagemaker, Santa Clara, CA 5AWS AI Research and Education, Santa Clara, CA. |
| Pseudocode | Yes | Algorithm 1 Grow; Algorithm 2 COLLAGE: Bfloat16 MCF Adam W Optimization |
| Open Source Code | Yes | The code is available at https://github.com/ amazon-science/collage. |
| Open Datasets | Yes | We first pre-train the BERT-base-uncased, BERT-large-uncased, and Ro BERTa-base model with Hugging Face (HF) (Wolf et al., 2019) configuration on the Wikipedia-en corpus (Attardi, 2015), preprocessed with BERT Wordpiece tokenizer. |
| Dataset Splits | Yes | We split the dataset into train/val/test with the split ratio 980 : 10 : 10. |
| Hardware Specification | Yes | We use aws.p4.24xlarge compute instances for all of our experiments. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'Hugging Face' libraries (e.g., 'Py Torch BFloat16 Tensor', 'Hugging Face'), but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Additional training and hyerparameter details are described in Appendix E.2. Table 10. Pre-training hyperparameters used for BERT and Ro BERTa. Table 11. Some configs and hyper-parameters of GPT models and Open LLa MA-7B. |