reproducibilityindex.ai

How could Neural Networks understand Programs?

Authors: Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of OSCAR on several semantics understanding tasks for programs in this section. We ﬁrst perform our model on a practical and important software engineering task, i.e., binary difﬁng. After that, we evaluate the performance of OSCAR for high-level PL understanding on the algorithm classiﬁcation task. Furthermore, as a pre-training method, we investigate the performance of OSCAR in zeroshot learning, where the parameters of OSCAR are ﬁxed. Finally, we analyze the components of our model in the ablation study.
Researcher Affiliation	Collaboration	1University of Science and Technology of China 2Microsoft Research Asia.
Pseudocode	No	The paper describes methods and steps but does not contain a clearly labeled pseudocode block or algorithm figure.
Open Source Code	Yes	Code and models are released at: https://github.com/pdlan/OSCAR.
Open Datasets	Yes	We conduct the experiments on POJ-104 dataset (Mou et al., 2016), which contains 104 algorithm problems that were submitted to an online judge system. We conduct the pre-training of OSCAR on a large corpus of real-world programs from publicly available open-source Git Hub repositories, which covers a broad range of disciplines from operating systems and compilers, to machine learning systems and linear algebra subprograms (Details in Appendix F.1).
Dataset Splits	No	The paper mentions using POJ-104 dataset and following the experimental setting of a previous paper (Cummins et al., 2020a;b), but it does not explicitly provide the specific training/validation/test dataset splits (percentages or sample counts) for its own experiments, nor for the large pre-training corpus.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using LLVM IR and that programs are compiled with GCC 7.5.0, but it does not provide specific version numbers for other key software components, libraries, or frameworks used for implementing and running the models (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Unless otherwise speciﬁed, all experiments are conducted on a 12-layer OSCAR model which is composed sequentially of three token-level encoder layers, six instruction-level encoder layers, and three token-level encoder layers. We follow Ro BERTa-base (Liu et al., 2019) to set other model conﬁgurations (Details in Appendix B), e.g., the dimensionality of hidden representation d is set to 768. The total sequence length of Inst. encoder is set to 512, where the IR and Env. encoders each account for 256 instructions. We set K = 4 in our experiments.