Convolutional Neural Networks over Tree Structures for Programming Language Processing

Authors: Lili Mou, Ge Li, Lu Zhang, Tao Wang, Zhi Jin

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental our experiments show its effectiveness in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN outperforms baseline methods, including several neural models for NLP.
Researcher Affiliation Academia Lili Mou,1 Ge Li,1 Lu Zhang,1 Tao Wang,2 Zhi Jin1 1Software Institute, Peking University Corresponding authors doublepower.mou@gmail.com, {lige,zhanglu,zhijin}@sei.pku.edu.cn 2Stanford University, twangcat@stanford.edu
Pseudocode No The paper describes computational steps for the model components but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes 1We make our source code and the collected dataset available through our website (https://sites.google.com/site/treebasedcnn/).
Open Datasets Yes The dataset of our experiments comes from a pedagogical programming open judge (OJ) system.6 There are a large number of programming problems on the OJ system. Students submit their source code as the solution to a certain problem; the OJ system automatically judges the validity of submitted source code by running the program. We downloaded the source code and the corresponding programming problems (represented as IDs) as our dataset. 6http://programming.grids.cn
Dataset Splits Yes We randomly chose exactly 500 programs in each class, and thus 52,000 samples in total, which were further randomly split by 3:1:1 for training, validation, and testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'pycparser' in a footnote for parsing ASTs, but it does not provide specific version numbers for this or any other software dependencies like libraries or frameworks.
Experiment Setup Yes Hyperparameter Value How is the value chosen? Initial learning rate 0.3 By validation Embedding dimension 30 Empirically Convolutional layers dim. 600 By validation Penultimate layer s dim. 600 Same as conv layers l2 penalty None Empirically