Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming
Authors: Ali Tehrani, Arijit Bhattacharjee, Le Chen, Nesreen K. Ahmed, Amir Yazdanbakhsh, Ali Jannesari
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Code Rosetta is evaluated on C++ CUDA and Fortran C++ translation tasks. It uses a customized learning framework with tailored pretraining and training objectives to effectively capture both code semantics and parallel structural nuances, enabling bidirectional translation. Our results show that Code Rosetta outperforms state-of-the-art baselines in C++ to CUDA translation by 2.9 BLEU and 1.72 Code BLEU points while improving compilation accuracy by 6.05%. |
| Researcher Affiliation | Collaboration | Ali Tehrani Jamsaz, Arijit Bhattacharjee, Le Chen, Nesreen K. Ahmed Amir Yazdanbakhsh , Ali Jannesari Iowa State University, Ames, Iowa, USA EMAIL Cisco Outshift, San Jose, CA, USA EMAIL Google Deep Mind, Mountain View, CA, USA EMAIL |
| Pseudocode | No | The paper uses figures to illustrate processes (e.g., Masked Language Modeling, AST Entity Recognition, Denoising Auto Encoding, Back Translation) but does not present structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: https://coderosetta.com |
| Open Datasets | Yes | For the C++ to CUDA translation task, we use the dataset from Babel Tower [46]... We extract the C++ and Fortran subsets from the Stack V2 dataset [25]... For fine-tuning, we use the small paired C++-Fortran dataset introduced by Bin et al. [19]. |
| Dataset Splits | Yes | Paired validation and test sets: The validation set consists of 184 pairs, and the test set has 180 pairs of C++ and CUDA source code files. For fine-tuning, we use the small paired C++-Fortran dataset introduced by Bin et al. [19]. This set is also used for validation. |
| Hardware Specification | Yes | The experiments were run on a single node with four Nvidia A100 SXM4 GPUs, each with 80GB of memory. |
| Software Dependencies | Yes | We implement Code Rosetta using the Hugging Face Transformers library v4.40.1 [47]. |
| Experiment Setup | Yes | The model is a 12-layer encoder-decoder transformer, with each layer having 12 attention heads and a hidden dimension of 1,536... The training was conducted using the Adam W optimizer [24] and a batch size of 16, using gradient accumulation over two steps. For Masked Language Modeling (MLM) training, we use a learning rate of 8 10 5 and train for 100 epochs with 15% masking... For Abstract Syntax Tree (AST) entity recognition, we use a learning rate of 5 10 6 and train for ten epochs... For Denoising Auto-Encoding and Back Translation, we use a learning rate of 5 10 5 and train for 20 epochs. For Denoising Auto-Encoding, we set the masking to 15%, token dropping to 25%, and token insertion to 15%, with a denoising ratio increasing by 2.5% per epoch. Finally, for fine-tuning, we use a learning rate of 5 10 5 for ten epochs. |