Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Guess & Sketch: Language Model Guided Transpilation
Authors: Celine Lee, Abdulrahman Mahmoud, Michal Kurek, Simone Campanoni, David Brooks, Stephen Chong, Gu-Yeon Wei, Alexander M Rush
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test GUESS & SKETCH on three different test sets of assembly transpilation tasks, varying in difficulty, and show that it successfully transpiles 57.6% more examples than GPT-4 and 39.6% more examples than an engineered transpiler. |
| Researcher Affiliation | Academia | Cornell University, Harvard University Northwestern University EMAIL |
| Pseudocode | Yes | Algorithm 1 GUESS & SKETCH Pseudocode |
| Open Source Code | No | The paper states "The resulting dataset is shared on Hugging Face" and "All resulting models are shared on Huggingface", and links to a baseline's code, but it does not provide an explicit statement or link for the source code of the GUESS & SKETCH methodology. |
| Open Datasets | Yes | Training data is composed of 307,916 ARMv8 and RISC-V assembly file pairs compiled from C code files from The Stack (Kocetkov et al., 2022). The resulting dataset is shared on Hugging Face3. 3https://huggingface.co/datasets/celinelee/paired arm risc |
| Dataset Splits | No | The paper describes the training dataset and separate test datasets, but it does not provide specific train/validation/test splits for the main training data or how the models were validated during training. |
| Hardware Specification | Yes | All language models are trained on one NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions using "Rosette (Torlak & Bodik, 2013)" and "Z3 (de Moura & Bjørner, 2008) SMT solver" but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use confidence threshold γ = 0.9 and Table 4: Training details for language models used. which includes L.R., Batch, No. Steps, Lo RA r, Lo RA Modules, Quant. values. |