reproducibilityindex.ai

StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation

Authors: Weike Fang, Zhejian Zhou, Junzhou He, Weihang Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluation results show that Stack Sight significantly improves Web Assembly decompilation. Our user study also demonstrates that code snippets generated by Stack Sight have significantly higher win rates and enable a better grasp of code semantics. We perform comprehensive experiments on Stack Sight. Results show that Stack Sight increases the amount of functionally correct decompiled codes by 70% and produces code that is notably more favorable from the perspective of human developers.
Researcher Affiliation	Academia	1Department of Computer Science, University of Southern California, Los Angeles, CA, United States.
Pseudocode	No	The paper describes algorithms and steps in text and figures (e.g., Stack Visualization algorithm, CoT prompting steps) but does not include formal pseudocode blocks or sections labeled "Algorithm".
Open Source Code	No	The paper does not include an explicit statement about releasing the source code for the Stack Sight methodology described, nor does it provide a direct link to a code repository.
Open Datasets	Yes	Human Eval-X (Zheng et al., 2023) is a widely-used Natural Language to Code dataset which has a split in C++. It is adapted from Human Eval (Chen et al., 2021) by human experts. MBXP (Athiwaratkun et al., 2023) is also a Natural Language to Code dataset that contains a C++ split.
Dataset Splits	No	The paper references the use of Human Eval-X and MBXP datasets for evaluation and testing, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility beyond using test cases for correctness evaluation.
Hardware Specification	No	The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run its experiments.
Software Dependencies	Yes	We use Emscripten (The Emscripten project, 2024) (version 3.1.46) to compile C++ code to Web Assembly. We leverage the wasm2wat (version 1.0.33) tool from the Web Assembly Binary Toolkit repository (Web Assembly, 2024) to convert binary instructions (.wasm) into text format (.wat). We conduct experiments on three large language models: gpt-3.5-turbo-1106 (Open AI, 2024a), gpt-4-0125-preview (Open AI, 2024b), and Code Llama-7b Instruct (Rozi ere et al., 2023).
Experiment Setup	Yes	We use Emscripten (The Emscripten project, 2024) (version 3.1.46) to compile C++ code to Web Assembly. We leverage the wasm2wat (version 1.0.33) tool from the Web Assembly Binary Toolkit repository (Web Assembly, 2024) to convert binary instructions (.wasm) into text format (.wat). We include one shot in Stack Sight. As the annotated code is twice as large, 2-shot or 3-shot ICL baseline will match the prompt size of Stack Sight. To compare our method against stronger baselines, we randomly select examples from both datasets, and conduct the same set of experiments for 3-shot, 5-shot, and 10-shot ICL in our evaluation. Appendix C. Compilation Command emcc "$target_cpp" -o "$output_wasm" \ --profiling-funcs \ # keep function # names -Wl,--demangle \ -s SIDE_MODULE=1 \ # compile as # side module -Oz # optimize size wasm2wat "$output_wasm" -o "$output_wat"