Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ShareBERT: Embeddings Are Capable of Learning Hidden Layers
Authors: Jia Cheng Hu, Roberto Cavicchioli, Giulia Berardinelli, Alessandro Capotondi
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that we achieve 95.5% of BERT Base performances using only 5M parameters (21.9 fewer parameters) and, most importantly, without the help of any transfer learning techniques. |
| Researcher Affiliation | Academia | Jia Cheng Hu, Roberto Cavicchioli, Giulia Berardinelli, Alessandro Capotondi University of Modena and Reggio Emilia via G.Campi 213/b 41125, Modena, Italy EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methods verbally and mathematically but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code will be available at https://github.com/jchenghu/sharebert. |
| Open Datasets | Yes | we perform MLM training on the 2022 English Wikipedia and Book Corpus (Zhu et al. 2015), we use the same sub-word tokenization of BERT (Devlin et al. 2018) in the uncased instance. All models are trained for 23000 steps, batch size of 4000 and fine-tuned on GLUE tasks. |
| Dataset Splits | Yes | All models are trained for 23000 steps, batch size of 4000 and fine-tuned on GLUE tasks. Table 2: Performance and the number of parameters comparison between Share BERT variants and BERT evaluated on the GLUE dev set. |
| Hardware Specification | No | The paper discusses 'memory-limited devices' as the problem domain but does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'FP16 mixed precision is adopted' and refers to BERT's tokenization but does not specify software dependencies like programming languages, machine learning frameworks, or libraries with version numbers. |
| Experiment Setup | Yes | All models are trained for 23000 steps, batch size of 4000 and fine-tuned on GLUE tasks. FP16 mixed precision is adopted. |