How could Neural Networks understand Programs?

Authors: Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of OSCAR on several semantics understanding tasks for programs in this section. We first perform our model on a practical and important software engineering task, i.e., binary diffing. After that, we evaluate the performance of OSCAR for high-level PL understanding on the algorithm classification task. Furthermore, as a pre-training method, we investigate the performance of OSCAR in zeroshot learning, where the parameters of OSCAR are fixed. Finally, we analyze the components of our model in the ablation study.
Researcher Affiliation Collaboration 1University of Science and Technology of China 2Microsoft Research Asia.
Pseudocode No The paper describes methods and steps but does not contain a clearly labeled pseudocode block or algorithm figure.
Open Source Code Yes Code and models are released at: https://github.com/pdlan/OSCAR.
Open Datasets Yes We conduct the experiments on POJ-104 dataset (Mou et al., 2016), which contains 104 algorithm problems that were submitted to an online judge system. We conduct the pre-training of OSCAR on a large corpus of real-world programs from publicly available open-source Git Hub repositories, which covers a broad range of disciplines from operating systems and compilers, to machine learning systems and linear algebra subprograms (Details in Appendix F.1).
Dataset Splits No The paper mentions using POJ-104 dataset and following the experimental setting of a previous paper (Cummins et al., 2020a;b), but it does not explicitly provide the specific training/validation/test dataset splits (percentages or sample counts) for its own experiments, nor for the large pre-training corpus.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using LLVM IR and that programs are compiled with GCC 7.5.0, but it does not provide specific version numbers for other key software components, libraries, or frameworks used for implementing and running the models (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Unless otherwise specified, all experiments are conducted on a 12-layer OSCAR model which is composed sequentially of three token-level encoder layers, six instruction-level encoder layers, and three token-level encoder layers. We follow Ro BERTa-base (Liu et al., 2019) to set other model configurations (Details in Appendix B), e.g., the dimensionality of hidden representation d is set to 768. The total sequence length of Inst. encoder is set to 512, where the IR and Env. encoders each account for 256 instructions. We set K = 4 in our experiments.