StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation
Authors: Weike Fang, Zhejian Zhou, Junzhou He, Weihang Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation results show that Stack Sight significantly improves Web Assembly decompilation. Our user study also demonstrates that code snippets generated by Stack Sight have significantly higher win rates and enable a better grasp of code semantics. We perform comprehensive experiments on Stack Sight. Results show that Stack Sight increases the amount of functionally correct decompiled codes by 70% and produces code that is notably more favorable from the perspective of human developers. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Southern California, Los Angeles, CA, United States. |
| Pseudocode | No | The paper describes algorithms and steps in text and figures (e.g., Stack Visualization algorithm, CoT prompting steps) but does not include formal pseudocode blocks or sections labeled "Algorithm". |
| Open Source Code | No | The paper does not include an explicit statement about releasing the source code for the Stack Sight methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Human Eval-X (Zheng et al., 2023) is a widely-used Natural Language to Code dataset which has a split in C++. It is adapted from Human Eval (Chen et al., 2021) by human experts. MBXP (Athiwaratkun et al., 2023) is also a Natural Language to Code dataset that contains a C++ split. |
| Dataset Splits | No | The paper references the use of Human Eval-X and MBXP datasets for evaluation and testing, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility beyond using test cases for correctness evaluation. |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run its experiments. |
| Software Dependencies | Yes | We use Emscripten (The Emscripten project, 2024) (version 3.1.46) to compile C++ code to Web Assembly. We leverage the wasm2wat (version 1.0.33) tool from the Web Assembly Binary Toolkit repository (Web Assembly, 2024) to convert binary instructions (.wasm) into text format (.wat). We conduct experiments on three large language models: gpt-3.5-turbo-1106 (Open AI, 2024a), gpt-4-0125-preview (Open AI, 2024b), and Code Llama-7b Instruct (Rozi ere et al., 2023). |
| Experiment Setup | Yes | We use Emscripten (The Emscripten project, 2024) (version 3.1.46) to compile C++ code to Web Assembly. We leverage the wasm2wat (version 1.0.33) tool from the Web Assembly Binary Toolkit repository (Web Assembly, 2024) to convert binary instructions (.wasm) into text format (.wat). We include one shot in Stack Sight. As the annotated code is twice as large, 2-shot or 3-shot ICL baseline will match the prompt size of Stack Sight. To compare our method against stronger baselines, we randomly select examples from both datasets, and conduct the same set of experiments for 3-shot, 5-shot, and 10-shot ICL in our evaluation. Appendix C. Compilation Command emcc "$target_cpp" -o "$output_wasm" \ --profiling-funcs \ # keep function # names -Wl,--demangle \ -s SIDE_MODULE=1 \ # compile as # side module -Oz # optimize size wasm2wat "$output_wasm" -o "$output_wat" |