reproducibilityindex.ai

Representation Deficiency in Masked Language Modeling

Authors: Yu Meng, Jitin Krishnan, Sinong Wang, Qifan Wang, Yuning Mao, Han Fang, Marjan Ghazvininejad, Jiawei Han, Luke Zettlemoyer

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that MAE-LM improves the utilization of model dimensions for real token representations, and MAE-LM consistently outperforms MLM-pretrained models on the GLUE and SQu AD benchmarks. 4 EXPERIMENTS 4.2 OVERALL RESULTS Table 1 shows the results under the two base model pretraining settings on the GLUE and SQu AD 2.0 benchmarks.
Researcher Affiliation	Collaboration	1University of Illinois Urbana-Champaign 2Meta AI 1{yumeng5, hanj}@illinois.edu 2{jitinkrishnan, sinongwang, wqfcr, yuningm, hanfang, ghazvini, lsz}@meta.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	1Code can be found at https://github.com/yumeng5/MAE-LM.
Open Datasets	Yes	We evaluate the pretrained models on the GLUE (Wang et al., 2018) and SQu AD 2.0 (Rajpurkar et al., 2018) benchmarks. The base setting uses 16GB training corpus following BERT (Devlin et al., 2019) while the base++ setting uses 160GB training corpus following Ro BERTa (Liu et al., 2019).
Dataset Splits	Yes	All reported fine-tuning results are the medians of five random seeds on GLUE and SQu AD, following previous studies (Liu et al., 2019). The hyperparameter search space for fine-tuning can be found in Appendix D.
Hardware Specification	Yes	The experiments in this paper are conducted on 64 A100 GPUs.
Software Dependencies	No	The paper mentions software components but does not provide specific version numbers, e.g., "We train both absolute and relative position embeddings (Raffel et al., 2019) in the encoder. The vocabulary is constructed with BPE (Sennrich et al., 2015)" without mentioning versions of the BPE library or general software environment like Python/PyTorch versions.
Experiment Setup	Yes	Pretraining Settings. We evaluate MAE-LM mainly under the base model scale for two pretraining settings: base and base++. Both settings pretrain 12-layer Transformers with 768 model dimensions. The base setting uses 16GB training corpus following BERT (Devlin et al., 2019) while the base++ setting uses 160GB training corpus following Ro BERTa (Liu et al., 2019). The details can be found in Appendix D. Table 3: Hyperparameters used in pretraining.