AMOM: Adaptive Masking over Masking for Conditional Masked Language Model
Authors: Yisheng Xiao, Ruiyang Xu, Lijun Wu, Juntao Li, Tao Qin, Tie-Yan Liu, Min Zhang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on 3 different tasks (neural machine translation, summarization, and code generation) with 15 datasets in total confirm that our proposed simple method achieves significant performance improvement over the strong CMLM model. |
| Researcher Affiliation | Collaboration | Yisheng Xiao1, Ruiyang Xu1, Lijun Wu2, Juntao Li1*, Tao Qin2, Tie-Yan Liu2, Min Zhang1 1Institute of Computer Science and Technology, Soochow University 2Microsoft Research Asia {ysxiaoo, ryxu1}@stu.suda.edu.cn, {ljt, minzhang}@suda.edu.cn, {lijuwu, taoqin, tyliu}@microsoft.com |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found. |
| Open Source Code | Yes | Our code is available at Git Hub1. 1https://github.com/amom-nar/AMOM |
| Open Datasets | Yes | For machine translation, we conduct experiments both on IWSLT and WMT datasets, which are widely used for NMT tasks. The datasets from IWSLT competitions contain 4 language pairs (170k pairs), see details in Table 2. For WMT datasets, we choose two language pairs which are widely used in non-autoregressive machine translation task, WMT16 English Roman (0.6M pairs) and WMT14 English German (4.5M pairs) tasks. [...] For summarization task, we use the XSUM dataset (Narayan, Cohen, and Lapata 2018)... For code generation task, we use Py150 dataset (Raychev, Bielik, and Vechev 2016) and use Git Hub-Java dataset (Allamanis and Sutton 2013). |
| Dataset Splits | Yes | For summarization task, we use the XSUM dataset (Narayan, Cohen, and Lapata 2018) which contains 204,045/11,332/11,334 online articles and single sentence summary pairs from the British Broadcasting Corporation for training/validation/test. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. It only mentions that 'All experiments are done using the Fairseq library (Ott et al. 2019)'. |
| Software Dependencies | No | The paper mentions using 'Fairseq library (Ott et al. 2019)', 'Python official library tokenizer3', and 'Javalang4', but it does not specify concrete version numbers for these software dependencies, which would be necessary for full reproducibility. |
| Experiment Setup | Yes | All experiments are done using the Fairseq library (Ott et al. 2019). Following previous settings (Ghazvininejad et al. 2019), we use the standard Transformerbase configuration on WMT datasets and standard Transformersmall configuration on IWSLT datasets for both auto-regressive and non-autoregressive experiments. During AMOM training, we follow the hyper-parameters in CMLMC (Huang, Perez, and Volkovs 2022) for WMT14 En De and follow the hyper-parameters of CMLM realization in Fairseq5 for the other datasets. During inference, we average the 5 best checkpoints chosen by validation BLEU scores as our final model and set the length beam as 3/5 for IWSLT/WMT datasets. [...] For all datasets, we set the limits ratio of adaptive X from 10%-30% and adaptive Y from 20%-80%, and select a linear mapping function to decide the masking ratios. |