Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Coda: An End-to-End Neural Program Decompiler
Authors: Cheng Fu, Huili Chen, Haolan Liu, Xinyun Chen, Yuandong Tian, Farinaz Koushanfar, Jishen Zhao
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assess Coda s performance with extensive experiments on various benchmarks. Evaluation results show that Coda achieves an average of 82% program recovery accuracy on unseen binary samples, where the state-of-the-art decompilers yield 0% accuracy. Furthermore, Coda outperforms the sequence-to-sequence model with attention by a margin of 70% program accuracy. |
| Researcher Affiliation | Collaboration | Cheng Fu, Huili Chen, Haolan Liu UC San Diego EMAIL Xinyun Chen UC Berkeley EMAIL Yuandong Tian Facebook EMAIL Farinaz Koushanfar, Jishen Zhao UC San Diego EMAIL |
| Pseudocode | Yes | Algorithm 1 Workflow of iterative EC Machine. |
| Open Source Code | No | The paper mentions using open-source disassemblers (mipt-mips, REDasm) but does not state that the code for Coda itself is open-source or provide a link. |
| Open Datasets | No | To build the training dataset for stage 1, we randomly generate 50,000 pairs of high-level programs with the corresponding assembly code for each task. The training dataset for the error correction stage is constructed by injecting various types of errors into the high-level code. The paper generated its own dataset and does not provide public access information. |
| Dataset Splits | No | The paper does not provide explicit details about a validation dataset split (e.g., percentages or counts). |
| Hardware Specification | No | The paper mentions "limited GPU memory" as a challenge for long programs but does not specify any particular GPU model, CPU, or other hardware used for the experiments. |
| Software Dependencies | No | The paper mentions using `clang` for compilation and `mipt-mips` and `REDasm` for disassembling, but it does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We set Smax = 30 and cmax = 10 for EC machine in Algorithm 1. In our experiments, we inject 10 20% token errors whose locations are sampled from a uniform random distribution. To address the class imbalance problem during EP training, we mask 35% of the tokens with error status 0 (i.e., no error occurs) when computing the loss. The program is compiled using clang with configuration -0O which disables all optimizations. |