reproducibilityindex.ai

Generating Adversarial Computer Programs using Optimized Obfuscations

Authors: Shashank Srikant, Sijia Liu, Tamara Mitrovska, Shiyu Chang, Quanfu Fan, Gaoyuan Zhang, Una-May O'Reilly

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our work on Python and Java programs on the problem of program summarization. We show that our best attack proposal achieves a 52% improvement over a state-of-the-art attack generation approach for programs trained on a SEQ2SEQ model.
Researcher Affiliation	Collaboration	Shashank Srikant1 Sijia Liu2,3 Tamara Mitrovska1 Shiyu Chang2 Quanfu Fan2 Gaoyuan Zhang2 Una-May O Reilly1 1CSAIL, MIT 2MIT-IBM Watson AI Lab 3Michigan State University
Pseudocode	No	The paper describes its algorithms (PGD, AO) using mathematical equations and textual explanations, but does not include any formally structured pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	Source code: https://github.com/ALFA-group/adversarial-code-generation
Open Datasets	Yes	We evaluate this on a well maintained dataset of roughly 150K Python programs(Raychev et al., 2016) and 700K Java programs (Alon et al., 2018).
Dataset Splits	Yes	The SEQ2SEQ model is trained and validated on 90% of the data while tested on the remaining 10%.
Hardware Specification	No	The paper mentions support with 'computational resources' but does not specify any particular hardware components like CPU models, GPU models, or memory sizes used for the experiments.
Software Dependencies	No	The paper mentions using PyTorch and that CODE2SEQ uses TensorFlow, but it does not provide specific version numbers for these or any other software libraries or dependencies required to replicate the experiments.
Experiment Setup	No	The paper states that the SEQ2SEQ model is 'optimized using the cross-entropy loss function' and that the smoothing parameter µ is 'set to 0.01'. It also mentions 'AO for 3 iterations, and JO for 10 iterations'. However, it does not provide a comprehensive list of hyperparameters for the SEQ2SEQ model itself, such as learning rate, batch size, or total training epochs, which are crucial for full reproducibility.