Generating Adversarial Computer Programs using Optimized Obfuscations

Authors: Shashank Srikant, Sijia Liu, Tamara Mitrovska, Shiyu Chang, Quanfu Fan, Gaoyuan Zhang, Una-May O'Reilly

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our work on Python and Java programs on the problem of program summarization. We show that our best attack proposal achieves a 52% improvement over a state-of-the-art attack generation approach for programs trained on a SEQ2SEQ model.
Researcher Affiliation Collaboration Shashank Srikant1 Sijia Liu2,3 Tamara Mitrovska1 Shiyu Chang2 Quanfu Fan2 Gaoyuan Zhang2 Una-May O Reilly1 1CSAIL, MIT 2MIT-IBM Watson AI Lab 3Michigan State University
Pseudocode No The paper describes its algorithms (PGD, AO) using mathematical equations and textual explanations, but does not include any formally structured pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Source code: https://github.com/ALFA-group/adversarial-code-generation
Open Datasets Yes We evaluate this on a well maintained dataset of roughly 150K Python programs(Raychev et al., 2016) and 700K Java programs (Alon et al., 2018).
Dataset Splits Yes The SEQ2SEQ model is trained and validated on 90% of the data while tested on the remaining 10%.
Hardware Specification No The paper mentions support with 'computational resources' but does not specify any particular hardware components like CPU models, GPU models, or memory sizes used for the experiments.
Software Dependencies No The paper mentions using PyTorch and that CODE2SEQ uses TensorFlow, but it does not provide specific version numbers for these or any other software libraries or dependencies required to replicate the experiments.
Experiment Setup No The paper states that the SEQ2SEQ model is 'optimized using the cross-entropy loss function' and that the smoothing parameter ยต is 'set to 0.01'. It also mentions 'AO for 3 iterations, and JO for 10 iterations'. However, it does not provide a comprehensive list of hyperparameters for the SEQ2SEQ model itself, such as learning rate, batch size, or total training epochs, which are crucial for full reproducibility.