reproducibilityindex.ai

TreeCaps: Tree-Based Capsule Networks for Source Code Processing

Authors: Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang30-38

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluated on a large number of Java and C/C++ programs, Tree Caps models outperform prior deep learning models of program source code, in terms of both accuracy and robustness for program comprehension tasks such as code functionality classiﬁcation and function name prediction. Our empirical evaluation shows that Tree Caps achieves better classiﬁcation accuracy and better F1 score in prediction compared to other code learning techniques such as Code2vec, Code2seq, ASTNN, TBCNN, GGNN, GREAT and GNN-Fi LM. We have also applied three types of semantic-preserving transformations (Rabin et al. 2020; Zhang et al. 2020; Wang and Su 2019) that transform programs into syntactically different but semantically equivalent code to attack the models. Evaluations also show that our Tree Caps models are the most robust, able to preserve its predictions for transformed programs more than other learning techniques.
Researcher Affiliation	Collaboration	Nghi D. Q. Bui 1 3 Yijun Yu 1 2 Lingxiao Jiang 3 1 Trustworthy Open-Source Software Engineering Lab, Huawei Research Centre, Ireland 2 School of Computing & Communications, The Open University, UK 3 School of Computing & Information Systems, Singapore Management University
Pseudocode	Yes	Algorithm 1 Dynamic Routing; Algorithm 2 Variable-to-Static Capsule Routing
Open Source Code	Yes	Our implementation is publicly available at: https://github.com/bdqnghi/treecaps.
Open Datasets	Yes	The ﬁrst Sorting Algorithms (SA) dataset is from Nghi, Yu, and Jiang (2019), which contains 10 algorithm classes of 1000 sorting programs written in Java. The second OJ dataset is from Mou et al. (2016), which contains 52000 C programs of 104 classes. We use the datasets from Code2seq(Alon et al. 2019a) containing three sets of Java programs: Java-Small (700k samples), Java-Med (4M samples), and Java-Large(16M samples).
Dataset Splits	Yes	We split each dataset into training, testing, and validation sets by the ratios of 70/20/10. These datasets have been split into training/testing/validation by projects.
Hardware Specification	Yes	To train the models, we use the Rectiﬁed Adam (RAdam) optimizer (Liu et al. 2019) with an initial learning rate of 0.001 subjected to decay on an Nvidia Tesla P100 GPU.
Software Dependencies	No	The paper mentions 'Tensorﬂow libraries' but does not specify a version number for TensorFlow or any other software dependencies, which is required for reproducibility.
Experiment Setup	Yes	For the parameters in our TBCNN layer, we follow Mou et al. (2016) to set the size of type embeddings to 128, the size of text embeddings to 128, and the number of convolutional steps m to 8. For the capsule layers, we set Nsc = 100, Dsc = 16, Dcc = 16 and routing iterations r = 3. We use Tensorﬂow libraries to implement Tree Caps. To train the models, we use the Rectiﬁed Adam (RAdam) optimizer (Liu et al. 2019) with an initial learning rate of 0.001 subjected to decay on an Nvidia Tesla P100 GPU.