reproducibilityindex.ai

LoQT: Low-Rank Adapters for Quantized Pretraining

Authors: Sebastian Loeschcke, Mads Toftrup, Michael Kastoryano, Serge Belongie, Vésteinn Snæbjarnarson

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Lo QT on language model pretraining by training LLa MA-based [15] language models on the C4 dataset [16].
Researcher Affiliation	Academia	Sebastian Loeschcke University of Copenhagen sbl@di.ku.dk Mads Toftrup Aarhus University toftrup@cs.au.dk Michael J. Kastoryano University of Copenhagen mika@di.ku.dk Serge Belongie University of Copenhagen s.belongie@di.ku.dk Vésteinn Snæbjarnarson University of Copenhagen vesn@di.ku.dk
Pseudocode	Yes	Figure 3: Pseudo-code for Lo QT. Algorithm 1 Lo QT: Low Rank Adapters for Quantized Training
Open Source Code	Yes	https://github.com/sebulo/Lo QT
Open Datasets	Yes	We evaluate Lo QT on language model pretraining by training LLa MA-based [15] language models on the C4 dataset [16], a collection of text in English that was extracted from the Common Crawl web-scrapes [16].
Dataset Splits	Yes	Table 1: Comparison of low-rank pre-training methods for LLa MA2-style language models on the C4 dataset. The table shows validation perplexity, memory estimates, and quantization states for Lo QT.
Hardware Specification	Yes	Runs were conducted on up to 4x 40GB NVIDIA A100s 2x 80GB NVIDIA H100s, or a single 24GB NVIDIA RTX 3090.
Software Dependencies	No	The paper mentions software like 'BF16 format', 'NF4 precision', and 'Adam optimizer', but does not provide specific version numbers for these or other software libraries.
Experiment Setup	Yes	We keep hyperparameters consistent across model sizes, with experiments conducted in BF16 format for memory efficiency. All models are trained with a maximum sequence length of 256, a total token batch size of 131K tokens, and a learning rate warmup for the first 10% of the training steps, followed by cosine annealing to 10% of the initial learning rate. Full experimental details, including the specific hyperparameters for each task, are provided in Appendix B.