Generating Adversarial Examples for Holding Robustness of Source Code Processing Models

Authors: Huangzhao Zhang, Zhuo Li, Ge Li, Lei Ma, Yang Liu, Zhi Jin1169-1176

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our in-depth evaluation on a functionality classification benchmark demonstrates the effectiveness of MHM in generating adversarial examples of source code. The higher robustness and performance enhanced through our adversarial training with MHM further confirms the usefulness of DL models-based method for future fully automated source code processing. Experiments In this section, we perform in-depth evaluation to demonstrate the usefulness of our proposed technique. We first introduce our experimental setups, and then present the experimental results of MHM on adversarial attack and adversarial training, respectively.
Researcher Affiliation Academia Huangzhao Zhang,1 Zhuo Li,1 Ge Li,1 Lei Ma,2 Yang Liu,3 Zhi Jin1 1Key Lab of High Confidence Software Technologies (Peking University, China), Ministry of Education 2Kyushu University, Japan, 3Nanyang Technology University, Singapore
Pseudocode Yes Algorithm 1 Metropolis-Hastings Modifier algorithm.
Open Source Code Yes Our tool and data are open-sourced and publicly available1. 1https://github.com/Metropolis-Hastings-Modifier/MHM
Open Datasets Yes Dataset. We choose the Open Judge (OJ) dataset, the benchmark dataset in source code classification, as the study subject, which is proposed by Mou et al. 2016.
Dataset Splits Yes Finally, we split the filtered dataset (4 : 1), resulting in a training set with the size of 38,924 and a test set with the size of 9,718... During training, we randomly extract 20% code files from the training set, forming the validation set.
Hardware Specification No The paper mentions that 'GA experiments take several days, while MHM only takes about several hours', indicating computational time, but it does not provide any specific hardware details such as GPU models, CPU specifications, or memory, that were used to run the experiments.
Software Dependencies No The paper mentions using a 'C++ (ver.11) parser' for filtering and the 'pycparser tool2' for AST generation, and optimizers 'Adam' and 'Ada Max' in Table 1. However, it does not provide version numbers for pycparser or the optimizers, nor for general programming languages or frameworks like Python, PyTorch, or TensorFlow, which are typically listed as key software dependencies with specific versions.
Experiment Setup Yes Table 1: Hyper-parameters of the subject models. This table explicitly lists detailed hyperparameters for LSTM and ASTNN models including 'Vocabulary size', 'Embedding size', 'Hidden size', 'Layers', 'Dropout', 'Batch size', 'Optimizer', and 'Learning rate'.