reproducibilityindex.ai

AMOM: Adaptive Masking over Masking for Conditional Masked Language Model

Authors: Yisheng Xiao, Ruiyang Xu, Lijun Wu, Juntao Li, Tao Qin, Tie-Yan Liu, Min Zhang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 3 different tasks (neural machine translation, summarization, and code generation) with 15 datasets in total confirm that our proposed simple method achieves significant performance improvement over the strong CMLM model.
Researcher Affiliation	Collaboration	Yisheng Xiao1, Ruiyang Xu1, Lijun Wu2, Juntao Li1*, Tao Qin2, Tie-Yan Liu2, Min Zhang1 1Institute of Computer Science and Technology, Soochow University 2Microsoft Research Asia {ysxiaoo, ryxu1}@stu.suda.edu.cn, {ljt, minzhang}@suda.edu.cn, {lijuwu, taoqin, tyliu}@microsoft.com
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found.
Open Source Code	Yes	Our code is available at Git Hub1. 1https://github.com/amom-nar/AMOM
Open Datasets	Yes	For machine translation, we conduct experiments both on IWSLT and WMT datasets, which are widely used for NMT tasks. The datasets from IWSLT competitions contain 4 language pairs (170k pairs), see details in Table 2. For WMT datasets, we choose two language pairs which are widely used in non-autoregressive machine translation task, WMT16 English Roman (0.6M pairs) and WMT14 English German (4.5M pairs) tasks. [...] For summarization task, we use the XSUM dataset (Narayan, Cohen, and Lapata 2018)... For code generation task, we use Py150 dataset (Raychev, Bielik, and Vechev 2016) and use Git Hub-Java dataset (Allamanis and Sutton 2013).
Dataset Splits	Yes	For summarization task, we use the XSUM dataset (Narayan, Cohen, and Lapata 2018) which contains 204,045/11,332/11,334 online articles and single sentence summary pairs from the British Broadcasting Corporation for training/validation/test.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. It only mentions that 'All experiments are done using the Fairseq library (Ott et al. 2019)'.
Software Dependencies	No	The paper mentions using 'Fairseq library (Ott et al. 2019)', 'Python official library tokenizer3', and 'Javalang4', but it does not specify concrete version numbers for these software dependencies, which would be necessary for full reproducibility.
Experiment Setup	Yes	All experiments are done using the Fairseq library (Ott et al. 2019). Following previous settings (Ghazvininejad et al. 2019), we use the standard Transformerbase configuration on WMT datasets and standard Transformersmall configuration on IWSLT datasets for both auto-regressive and non-autoregressive experiments. During AMOM training, we follow the hyper-parameters in CMLMC (Huang, Perez, and Volkovs 2022) for WMT14 En De and follow the hyper-parameters of CMLM realization in Fairseq5 for the other datasets. During inference, we average the 5 best checkpoints chosen by validation BLEU scores as our final model and set the length beam as 3/5 for IWSLT/WMT datasets. [...] For all datasets, we set the limits ratio of adaptive X from 10%-30% and adaptive Y from 20%-80%, and select a linear mapping function to decide the masking ratios.