reproducibilityindex.ai

ShareBERT: Embeddings Are Capable of Learning Hidden Layers

Authors: Jia Cheng Hu, Roberto Cavicchioli, Giulia Berardinelli, Alessandro Capotondi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that we achieve 95.5% of BERT Base performances using only 5M parameters (21.9 fewer parameters) and, most importantly, without the help of any transfer learning techniques.
Researcher Affiliation	Academia	Jia Cheng Hu, Roberto Cavicchioli, Giulia Berardinelli, Alessandro Capotondi University of Modena and Reggio Emilia via G.Campi 213/b 41125, Modena, Italy jiachenghu@unimore.it, roberto.cavicchioli@unimore.it, giulia.berardinelli@unimore.it, alessandro.capotondi@unimore.it
Pseudocode	No	The paper describes its methods verbally and mathematically but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code will be available at https://github.com/jchenghu/sharebert.
Open Datasets	Yes	we perform MLM training on the 2022 English Wikipedia and Book Corpus (Zhu et al. 2015), we use the same sub-word tokenization of BERT (Devlin et al. 2018) in the uncased instance. All models are trained for 23000 steps, batch size of 4000 and fine-tuned on GLUE tasks.
Dataset Splits	Yes	All models are trained for 23000 steps, batch size of 4000 and fine-tuned on GLUE tasks. Table 2: Performance and the number of parameters comparison between Share BERT variants and BERT evaluated on the GLUE dev set.
Hardware Specification	No	The paper discusses 'memory-limited devices' as the problem domain but does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions 'FP16 mixed precision is adopted' and refers to BERT's tokenization but does not specify software dependencies like programming languages, machine learning frameworks, or libraries with version numbers.
Experiment Setup	Yes	All models are trained for 23000 steps, batch size of 4000 and fine-tuned on GLUE tasks. FP16 mixed precision is adopted.