reproducibilityindex.ai

Exploiting Code Symmetries for Learning Program Semantics

Authors: Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, Suman Jana

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our solution, SYMC, develops a novel variant of self-attention that is provably equivariant to code symmetries from the permutation group defined over the program dependence graph. SYMC obtains superior performance on five program analysis tasks, outperforming stateof-the-art code models, including GPT-4, without any pre-training. Our results suggest that code LLMs that encode the code structural prior via the code symmetry group generalize better and faster.
Researcher Affiliation	Academia	1Columbia University 2The University of Chicago 3University of Michigan 4Huazhong University of Science and Technology 5University of Washington 6University College London. Correspondence to: Kexin Pei<kpei@cs.uchicago.edu>, Suman Jana <suman@cs.columbia.edu>.
Pseudocode	No	The paper describes algorithms and operations using text and mathematical notation, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the SYMC methodology or a link to a code repository.
Open Datasets	Yes	We use the Java dataset collected by Allamanis et al. (2016) to evaluate the function name prediction. The dataset includes 11 Java projects, such as Hadoop, Gradle, etc., totaling 707K methods and 5.6M statements. We fix Hadoop as our test set and use the other projects for training, to ensure the two sets do not overlap. For defect prediction, we obtain the dataset from Defects4J (Just et al., 2014). We collect and compile 27 open-source projects, such as Open SSL, Image Magic, Core Utils, SQLite, etc.
Dataset Splits	No	The paper mentions '14K/6K training/testing samples', which indicates a train/test split. However, it does not explicitly provide details for a separate validation split or a complete three-way (train/validation/test) split.
Hardware Specification	Yes	We conduct all the experiments on three Linux servers with Ubuntu 20.04 LTS, each featuring an AMD EPYC 7502 processor, 128 virtual cores, and 256GB RAM, with 12 Nvidia RTX 3090 GPUs in total.
Software Dependencies	No	The paper states, 'We implement SYMC using Fairseq (Ott et al., 2019) PyTorch (Paszke et al., 2019)' and mentions using 'Ghidra'. While specific software is named with citations, explicit version numbers for PyTorch, Fairseq, or Ghidra are not provided in the text.
Experiment Setup	Yes	We use SYMC with 8 attention layers, 12 attention heads, and a maximum input length of 512. For training, we use 10 epochs, a batch size of 64, and 14K/6K training/testing samples (strictly non-overlapping) unless stated otherwise. We employ 16-bit weight parameters for SYMC to optimize for memory efficiency.