reproducibilityindex.ai

Code Translation with Compiler Representations

Authors: Marc Szafraniec, Baptiste Roziere, Hugh James Leather, Patrick Labatut, Francois Charton, Gabriel Synnaeve

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method improves upon the state of the art for unsupervised code translation, increasing the number of correct translations by 11% on average, and up to 79% for the Java Rust pair with greedy decoding. With beam search, it increases the number of correct translations by 5.5% in average. We extend previous test sets for code translation, by adding hundreds of Go and Rust functions. Additionally, we train models with high performance on the problem of IR decompilation, generating programming source code from IR, and study using IRs as intermediary pivot for translation.
Researcher Affiliation	Industry	Marc Szafraniec Baptiste Rozière* Hugh Leather François Charton Patrick Labatut Gabriel Synnaeve Meta AI {mszafraniec,broz}@meta.com
Pseudocode	No	The paper describes methodologies and objectives in prose (e.g., in Section 3 "TRAINING OBJECTIVES") and presents figures illustrating concepts (e.g., Figure 3 "IR for code representation objectives"), but it does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks with structured steps.
Open Source Code	No	The paper does not contain an explicit statement that the authors are releasing the source code for their method, nor does it provide a direct link to a code repository for their implementation. It mentions open-source compilers and tools that they used (e.g., "clang++", "JLang", "Gollvm", "rustc", "Ret Dec") but not their own code for the proposed methodology.
Open Datasets	Yes	Our training data was extracted with Google Big Query, which indexes over 2.8 million open source repositories from Git Hub2. We selected projects whose license explicitly permits re-distribution of parts, and extracted all individual C++, Java, Rust and Go functions. To learn to decompile IRs, we also used the Code Net dataset (Puri et al., 2021), a repository of 14 million competitive programming solutions in 55 languages. Our models work at function level: this reduces compilation failures over missing dependencies, while keeping sequence lengths short. We extend the parallel evaluation dataset of 852 functions in C++, Java and Python from Roziere et al. (2020) with 343 more functions in Go and 280 more in Rust, along with corresponding test cases
Dataset Splits	No	The paper discusses training data and test sets, but it does not explicitly provide specific training/validation/test dataset splits, such as exact percentages, absolute sample counts for each split, or mention a dedicated validation set. It states: "The translation models presented in Tables 2 and 3 were trained for a week on 32 NVIDIA V100 GPUs."
Hardware Specification	Yes	The translation models presented in Tables 2 and 3 were trained for a week on 32 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions "Our models are implemented in Py Torch using mixed-precision floats" but does not specify the version number of PyTorch or any other key software dependencies with version numbers (e.g., "PyTorch 1.9") needed for replication.
Experiment Setup	Yes	For Trans Coder, we consider a sequence-to-sequence (seq2seq) transformer model (Vaswani et al., 2017) with attention (Bahdanau et al., 2015; Sutskever et al., 2014) and the same architecture as Roziere et al. (2020). Our model has 12 layers (6 in the encoder and 6 in the decoder), 8 attention heads, and a dimension of 1024. For the objectives that add noise and masks to the input sentence, such as MLM, TLM, AE, and TAE, we choose the masked tokens and noise randomly on the fly at each epoch. We mask 15% of the tokens in MLM and TLM. In AE and TAE, we mask 20% of the tokens. MLM is trained on streams of data, while the other objectives are trained at function level. We use the Adam optimizer (Kingma and Ba, 2015) and an inverse squared-root learning rate scheduler, with an initial learning rate of 10 5 in most of our experiments. Our models are implemented in Py Torch using mixed-precision floats.