AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Authors: Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, Jingren Zhou
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Ada BERT on several NLP tasks, and the results demonstrate that those task-adaptive compressed models are 12.7x to 29.3x faster than BERT in inference time and 11.5x to 17.0x smaller in terms of parameter size, while comparable performance is maintained. |
| Researcher Affiliation | Industry | Daoyuan Chen , Yaliang Li , Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, Jingren Zhou Alibaba Group |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for their methodology. |
| Open Datasets | Yes | We evaluate the proposed Ada BERT method on six datasets from GLUE [Wang et al., 2019a] benchmark. |
| Dataset Splits | Yes | We evaluate the proposed Ada BERT method on six datasets from GLUE [Wang et al., 2019a] benchmark. |
| Hardware Specification | No | The paper mentions that "The inference time is tested with a batch size of 128 over 50, 000 samples." but does not specify any hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions optimizers used (SGD, Adam) and their parameters, but does not specify software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | For Ada BERT, we set γ = 0.8, β = 4, T = 1, inner node N = 3 and search layer Kmax = 8. We search Pα for 80 epochs and derive the searched structure with its trained operation weights. |