reproducibilityindex.ai

XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Authors: Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a very comprehensive systematic study to measure the impact of many key hyperparameters and training strategies from previous works.
Researcher Affiliation	Industry	Microsoft {xiaoxiawu, zheweiyao, minjiaz, conglong.li, yuxhe}@microsoft.com
Pseudocode	No	The paper describes its methods in prose and figures, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code is released as a part of https://github.com/microsoft/DeepSpeed
Open Datasets	Yes	All these evaluations are performed with the General Language Understanding Evaluation (GLUE) benchmark [51]
Dataset Splits	Yes	We report results on the development sets after compressing a pre-trained model (e.g., BERTbase and Tiny BERT) using the corresponding single-task training data.
Hardware Specification	No	The paper states 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Those are in main text.' However, specific hardware details like GPU/CPU models or explicit cloud instance types are not found in the main text.
Software Dependencies	No	The paper does not provide specific software dependency versions (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) in the main text.
Experiment Setup	Yes	We consider three budgets listed in Table 1, which cover the practical scenarios of short, standard, and long training time... Meanwhile, we also perform a grid search of peak learning rates {2e-5, 1e-4, 5e-4}. For more training details on iterations and batch size per iteration, please see Table C.1.