ShareBERT: Embeddings Are Capable of Learning Hidden Layers
Authors: Jia Cheng Hu, Roberto Cavicchioli, Giulia Berardinelli, Alessandro Capotondi
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that we achieve 95.5% of BERT Base performances using only 5M parameters (21.9 fewer parameters) and, most importantly, without the help of any transfer learning techniques. |
| Researcher Affiliation | Academia | Jia Cheng Hu, Roberto Cavicchioli, Giulia Berardinelli, Alessandro Capotondi University of Modena and Reggio Emilia via G.Campi 213/b 41125, Modena, Italy jiachenghu@unimore.it, roberto.cavicchioli@unimore.it, giulia.berardinelli@unimore.it, alessandro.capotondi@unimore.it |
| Pseudocode | No | The paper describes its methods verbally and mathematically but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code will be available at https://github.com/jchenghu/sharebert. |
| Open Datasets | Yes | we perform MLM training on the 2022 English Wikipedia and Book Corpus (Zhu et al. 2015), we use the same sub-word tokenization of BERT (Devlin et al. 2018) in the uncased instance. All models are trained for 23000 steps, batch size of 4000 and fine-tuned on GLUE tasks. |
| Dataset Splits | Yes | All models are trained for 23000 steps, batch size of 4000 and fine-tuned on GLUE tasks. Table 2: Performance and the number of parameters comparison between Share BERT variants and BERT evaluated on the GLUE dev set. |
| Hardware Specification | No | The paper discusses 'memory-limited devices' as the problem domain but does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'FP16 mixed precision is adopted' and refers to BERT's tokenization but does not specify software dependencies like programming languages, machine learning frameworks, or libraries with version numbers. |
| Experiment Setup | Yes | All models are trained for 23000 steps, batch size of 4000 and fine-tuned on GLUE tasks. FP16 mixed precision is adopted. |