Multilingual Code Snippets Training for Program Translation

Authors: Ming Zhu, Karthik Suresh, Chandan K Reddy11783-11790

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the multilingual snippet training is effective in improving program translation performance, especially for low-resource languages. Moreover, our training method shows good generalizability and consistently improves the translation performance of a number of baseline models. The proposed model outperforms the baselines on both snippet-level and program-level translation, and achieves state-of-the-art performance on Code XGLUE translation task.
Researcher Affiliation Academia Ming Zhu, Karthik Suresh, Chandan K. Reddy Department of Computer Science, Virginia Tech, Arlington VA 22203. mingzhu@vt.edu, karthiks@vt.edu, reddy@cs.vt.edu
Pseudocode No The paper includes mathematical formulations for objective functions but no structured pseudocode or algorithm blocks.
Open Source Code Yes The code, data, and appendix for this paper can be found at https://github.com/reddy-labcode-research/Mu ST-Co ST.
Open Datasets Yes We used the monolingual snippets to do the multilingual snippet DAE training, and the pairwise snippets to do the multilingual snippet translation (Mu ST) training... We used the pairwise program data to fine-tune the model for program translation. Code XGLUE Translation Dataset... We used the translation dataset (Java-C#) from Code XGLUE for evaluation.
Dataset Splits Yes The train-validation-test data is split at the problem level, to ensure no overlapping snippets between the splits in any of the languages. The statistics of the split in each language can be found in the Appendix.
Hardware Specification Yes The model was trained with 4 RTX 8000 GPUs with 48GB memory on each GPU.
Software Dependencies No The paper mentions "Adam optimizer (Kingma and Ba 2014)", the "Transformer (Vaswani et al. 2017)" for the learning rate scheduler, and initializes the model with "dobf plus denoising.pth" from the DOBF model (Roziere et al. 2021). However, it does not specify version numbers for general software dependencies like Python or PyTorch.
Experiment Setup Yes In our model, the encoder and decoder consist of 12 and 6 transformer layers, respectively. The transformer units have a model dimension of 768, and 12 attention heads. The weight of the multilingual snippet DAE objective λ was set to 1.0 in the beginning, and decayed to 0.1 linearly in 30K steps, and then to 0 in 100K steps... Float 16 operations were used to speed up the training. The model was trained using Adam optimizer (Kingma and Ba 2014) with a learning rate of 0.0001, and the same learning rate scheduler was used from the Transformer (Vaswani et al. 2017). We used a batch size of 128 on all the 42 language pairs.