Generating Adversarial Examples for Holding Robustness of Source Code Processing Models
Authors: Huangzhao Zhang, Zhuo Li, Ge Li, Lei Ma, Yang Liu, Zhi Jin1169-1176
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our in-depth evaluation on a functionality classification benchmark demonstrates the effectiveness of MHM in generating adversarial examples of source code. The higher robustness and performance enhanced through our adversarial training with MHM further confirms the usefulness of DL models-based method for future fully automated source code processing. Experiments In this section, we perform in-depth evaluation to demonstrate the usefulness of our proposed technique. We first introduce our experimental setups, and then present the experimental results of MHM on adversarial attack and adversarial training, respectively. |
| Researcher Affiliation | Academia | Huangzhao Zhang,1 Zhuo Li,1 Ge Li,1 Lei Ma,2 Yang Liu,3 Zhi Jin1 1Key Lab of High Confidence Software Technologies (Peking University, China), Ministry of Education 2Kyushu University, Japan, 3Nanyang Technology University, Singapore |
| Pseudocode | Yes | Algorithm 1 Metropolis-Hastings Modifier algorithm. |
| Open Source Code | Yes | Our tool and data are open-sourced and publicly available1. 1https://github.com/Metropolis-Hastings-Modifier/MHM |
| Open Datasets | Yes | Dataset. We choose the Open Judge (OJ) dataset, the benchmark dataset in source code classification, as the study subject, which is proposed by Mou et al. 2016. |
| Dataset Splits | Yes | Finally, we split the filtered dataset (4 : 1), resulting in a training set with the size of 38,924 and a test set with the size of 9,718... During training, we randomly extract 20% code files from the training set, forming the validation set. |
| Hardware Specification | No | The paper mentions that 'GA experiments take several days, while MHM only takes about several hours', indicating computational time, but it does not provide any specific hardware details such as GPU models, CPU specifications, or memory, that were used to run the experiments. |
| Software Dependencies | No | The paper mentions using a 'C++ (ver.11) parser' for filtering and the 'pycparser tool2' for AST generation, and optimizers 'Adam' and 'Ada Max' in Table 1. However, it does not provide version numbers for pycparser or the optimizers, nor for general programming languages or frameworks like Python, PyTorch, or TensorFlow, which are typically listed as key software dependencies with specific versions. |
| Experiment Setup | Yes | Table 1: Hyper-parameters of the subject models. This table explicitly lists detailed hyperparameters for LSTM and ASTNN models including 'Vocabulary size', 'Embedding size', 'Hidden size', 'Layers', 'Dropout', 'Batch size', 'Optimizer', and 'Learning rate'. |