Generating Adversarial Computer Programs using Optimized Obfuscations
Authors: Shashank Srikant, Sijia Liu, Tamara Mitrovska, Shiyu Chang, Quanfu Fan, Gaoyuan Zhang, Una-May O'Reilly
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our work on Python and Java programs on the problem of program summarization. We show that our best attack proposal achieves a 52% improvement over a state-of-the-art attack generation approach for programs trained on a SEQ2SEQ model. |
| Researcher Affiliation | Collaboration | Shashank Srikant1 Sijia Liu2,3 Tamara Mitrovska1 Shiyu Chang2 Quanfu Fan2 Gaoyuan Zhang2 Una-May O Reilly1 1CSAIL, MIT 2MIT-IBM Watson AI Lab 3Michigan State University |
| Pseudocode | No | The paper describes its algorithms (PGD, AO) using mathematical equations and textual explanations, but does not include any formally structured pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Source code: https://github.com/ALFA-group/adversarial-code-generation |
| Open Datasets | Yes | We evaluate this on a well maintained dataset of roughly 150K Python programs(Raychev et al., 2016) and 700K Java programs (Alon et al., 2018). |
| Dataset Splits | Yes | The SEQ2SEQ model is trained and validated on 90% of the data while tested on the remaining 10%. |
| Hardware Specification | No | The paper mentions support with 'computational resources' but does not specify any particular hardware components like CPU models, GPU models, or memory sizes used for the experiments. |
| Software Dependencies | No | The paper mentions using PyTorch and that CODE2SEQ uses TensorFlow, but it does not provide specific version numbers for these or any other software libraries or dependencies required to replicate the experiments. |
| Experiment Setup | No | The paper states that the SEQ2SEQ model is 'optimized using the cross-entropy loss function' and that the smoothing parameter ยต is 'set to 0.01'. It also mentions 'AO for 3 iterations, and JO for 10 iterations'. However, it does not provide a comprehensive list of hyperparameters for the SEQ2SEQ model itself, such as learning rate, batch size, or total training epochs, which are crucial for full reproducibility. |